Teradata 14.00 offers three enhancements that enable you to customize statistic collection to suit your requirements better:
COLLECT STATISTICS USING SAMPLE n PERCENT COLUMN(column_name) ON table_name;
COLLECT STATISTICS USING MAXINTERBALS n COLUMN(column_name) ON table_name;
COLLECT STATISTICS USING MAXVALUELENGTH n COLUMN(column_name) ON table_name;
The USING SAMPLE option
Previously, a single global percentage for sampling table rows was applied to all statistics, despite already having available SAMPLE statistics. However, with Teradata 14.00, you now have the ability to specify a unique sample size for each collection statement.
The USING MAXVALUELENGTH option
Teradata statistics information is created by considering a limited number of bytes in a column, starting from the leftmost character.
Teradata 14.00 increased the differentiation threshold for individual row entries from 16 to 25 bytes. However, columns that consume more bytes are still considered equal. Additionally, character columns no longer undergo padding with spaces, which used to occur for all row entries consuming fewer than the threshold level of bytes prior to Teradata 14.00.
Although the default value is presently 25 bytes, you may encounter broader columns, such as VARCHAR(2000), in which most entries vary primarily in details towards the right end of the string.
To accurately distinguish between these entries, character counts beyond position 25 and up to 2000 must be included for histogram generation. To accomplish this, the MAXVALUELENGTH option can be utilized to expand the number of bytes analyzed.
The USING MAXINTERVALS option
A new Teradata statistics feature pertains to the number of intervals utilized in constructing the statistics histogram. Each interval incorporates the most common value and its corresponding record count within the interval.
In prior versions, the histogram class number remained constant (currently set to 250 intervals by default on Teradata 14); however, it is now possible to select a value ranging from 0 to 500. What benefits can be derived from altering the interval count?
Increasing the number of classes in the histogram improves the chances of capturing localized value concentrations. This is particularly beneficial for optimization tasks when the distribution of row entries displays numerous sharp peaks and valleys concentrated in a small range of potential values. Aggregating them into a single histogram class can lead to distortion.