In Teradata 13.10 the Optimizer uses random-AMP sampling to detect data growth, which can be enough for tables without skew.
The weak point of this approach shows up as soon as you insert and delete rows without recollecting statistics. The Teradata 13.10 Optimizer extrapolates with heuristics to avoid underestimation of row counts in such cases.
The below test scenario proves the benefit of improvements introduced with Teradata 14.10.
Here are the details of our test setup. It’s used on both Teradata versions, 13.10 and 14.10:
Teradata Statistics Extrapolation – Happens always with 13.10
At first we are executing below query on Teradata 13.10, to show the behaviour up to release 13.10:
EXPLAIN SELECT * FROM TheDB.TheTable WHERE TheCol = ‘C’;
We are selecting a value which is not available in column TheCol (‘C’). Still, the Optimizer extrapolates and estimates 20,000 rows (10% of the table rows).
Why is the Optimizer doing this? To avoid the underestimation of row counts in the case of
- Inserts and deletes
- Without refreshing the statistics afterward
Still, the estimation is wrong: the real number of selected rows is 0.
Statistics Extrapolation on Teradata 14.10 – only if needed!
Luckily, starting with Teradata 14.10, object use count (OUC) can be utilized by the Optimizer to detect data changes since the last statistics collection.
The Optimizer queries the object use count (OUC). It avoids statistics extrapolation for unchanged tables. A vast improvement!
We repeat the execution of above SQL select statement but this time on Teradata 14.10. Now it has the following execution plan and estimations:
Notice that the Optimizer has “high confidence” on both Teradata versions. Still, the estimation on Teradata 14.10 is better (correct) because of OUC. Teradata 13.10 relies on the inaccurate heuristics to estimate changes done since the last statistics collection.