When assessing the precision of statistics in a table, individuals often depend solely on the collection date. However, this may not always provide enough information. For example, static lookup tables may retain accurate Teradata statistics even if they are not updated. Conversely, some tables experience modifications, with some fields exhibiting broader value ranges while others remain unchanged with a limited number of codes.
To validate your table’s statistics, utilize a WHERE condition to execute a simple query on the desired column. For instance:
SELECT * FROM TABLE WHERE TEST_COLUMN = 'X';
Compare the estimated record count, visible in Teradata Administrator, with the returned record count. If the discrepancy exceeds a single-digit percentage relative to the table’s overall size, then the statistics likely require updating.
Remember that the optimizer’s precision may not always extend to each row. Nonetheless, the disparity between the expected and factual records count ought not be substantial.
Hi Alan,
One example comes to my mind:
In case the Teradata system configuration changes (for example, by adding nodes) and rows of a PPI Table are distributed to new partitions.
This happens for example if you are using the HASHING functions of Teradata in the PARTITIONING definition of your table DDL:
HASHAMP(HASHBUCKET(HASHROW(column)))
As a bigger system has more AMPS, rows are distributed differently to the partitions and the statistics become stale and have to be recollected.
This is an example, where the table content stayed the same but statistics became stale.
I was once advised to recollect statistics on all the static ‘lookup’ tables held in a database, on a monthly basis, ‘just in case’ the statistics needed to be refreshed…needless to say, I never saw a reason to do this as it never had any impact. Would there ever be a case where statistics become ‘stale’ and therefore ignored by the optimizer in such circumstances?