Teradata Statistics – The USING Option In Teradata 14

Teradata 14.00 offers three enhancements that enable you to customize statistic collection to suit your requirements better:

COLLECT STATISTICS USING SAMPLE n PERCENT COLUMN(column_name) ON table_name;
COLLECT STATISTICS USING MAXINTERVALS n COLUMN(column_name) ON table_name;
COLLECT STATISTICS USING MAXVALUELENGTH n COLUMN(column_name) ON table_name;

The USING SAMPLE option

Previously, a single global percentage for sampling table rows was applied to all statistics, even when SAMPLE statistics were already available. However, with Teradata 14.00, you now have the ability to specify a unique sample size for each collection statement.

The USING MAXVALUELENGTH option

Teradata statistics information is created by considering a limited number of bytes in a column, starting from the leftmost character.

Teradata 14.00 increased the differentiation threshold for individual row entries from 16 to 25 bytes. However, columns that consume more bytes are still considered equal. Additionally, character columns no longer undergo padding with spaces, which used to occur for all row entries consuming fewer than the threshold level of bytes prior to Teradata 14.00.

Although the default value is presently 25 bytes, you may encounter wider columns, such as VARCHAR(2000), in which most entries vary primarily in details towards the right end of the string.

To accurately distinguish between these entries, character counts beyond position 25 and up to 2000 must be included for histogram generation. To accomplish this, the MAXVALUELENGTH option can be utilized to expand the number of bytes analyzed.

The USING MAXINTERVALS option

A new Teradata statistics feature pertains to the number of intervals utilized in constructing the statistics histogram. Each interval incorporates the most common value and its corresponding record count within the interval.

In prior versions, the histogram class number remained constant (currently set to 250 intervals by default on Teradata 14); however, it is now possible to select a value ranging from 0 to 500. What benefits can be derived from altering the interval count?

Increasing the number of classes in the histogram improves the chances of capturing localized value concentrations. This is particularly beneficial for optimization tasks when the distribution of row entries displays numerous sharp peaks and valleys concentrated in a small range of potential values. Aggregating them into a single histogram class can lead to distortion.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

Improvements in Tailoring Statistics Collection with Teradata 14.00: Using Sample, MaxValueLength, and MaxIntervals Options

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

Leave a Comment Cancel reply