For indexed columns (PI, USI, NUSI) without statistics, Teradata calculates them at runtime. However, these statistics only contain a few key figures, such as the table cardinality and distinct values. This process is called dynamic AMP sampling or earlier also dynamic AMP sampling. Dynamic AMP samples are stored in the table header of each AMPs FSG cache.
For each table, a random AMP is chosen. It is determined by calculating the row hash of the internal TABLE_ID each database table has. The well-known hashing algorithm is used to select the target AMP:
CALCULATE ROWHASH(TABLE_ID) -> HASHMAP(ROWHASH) -> AMP
The restriction to the determination of some essential key figures is necessary because dynamic AMP samples have to be created during the creation of the execution plan (i.e. quickly) if they are not already available.
By default, dynamic AMP sampling is performed on a single AMP. This behavior can be changed: 2 AMPs, 5 AMPs, all AMPs of a node, or all-AMPs of all nodes can be used for sampling. To do this, changes have to be made in the system settings.
In the DBC control area, we can change the field “RandomAmpSampling.” The following settings are possible:
D…use the default 1 AMP sampling
L.. use 2 AMPs for sampling
M…use 5 AMPs for sampling
N…use all-AMPs belonging to one node for sampling
A…use all available AMPs in the system for sampling
It should be noted, however, that with the number of sampling AMPs, the time needed to create the statistics also increases. This can be a disadvantage for queries that typically return a result within a few seconds like tactical queries, but is usually not a problem with strategic workloads. More than one sampling AMP should be considered when tables are skewed, as this gives the optimizer a better picture of the data demographics.
From our experience with various Teradata installations, it only makes sense to choose between one-AMP sampling or ALl-AMP sampling. Everything in between may improve the execution plans but choosing the correct number of sampling AMPs is too complex and often does not bring the desired success. All-AMP sampling has the advantage that like collected statistics histograms are used to store detailed key figures.
Teradata always generates dynamic AMP samples for indexes even when statistics are collected. These can be used to detect stale statistics. Dynamic AMP Sampling is done in detail like this:
- If the table header is not available in the FSG cache, a dynamic AMP sample of the index is taken and distributed to all AMPs.
- The related DBC tables are checked, to see if collected statistics are available
- If no collected statistics are available, the dynamic AMP sample is used for estimations
- If the collected statistics are available, the dynamic AMP sample is compared against the collected statistics.
- If the difference between collected statistics and random AMP samples is above a certain threshold, the dynamic AMP sample will
be used. Otherwise, the collected statistics will be used. The parsing engine (PE) will be able to extrapolate statistics by considering the object use counts (OUC). Details you can read here:
The Sampling Process for PI, USI, NUSI
Depending on the settings, these steps will be done by one or several AMPs at the same time:
- Each sampling AMP reads 1 – 2 cylinders from disk into the FSG cache.
Usually, it’s two cylinders (the first and the last belonging to the table). But if the table fits entirely into one cylinder, only one cylinder has to be read.
- The average number of rows of the cylinder’s data blocks is calculated
- The average number of rows is multiplied by the number of data blocks
- Above result is multiplied by the number of cylinders belonging the table
- Above result is multiplied by the number of AMPs available in the entire system
The result of the above calculations is the estimation of the table’s row count.
Teradata uses a similar approach when doing random AMP sampling for a NUSI:
- Each sampling AMP reads 1 – 2 cylinders from disk into the FSG cache
- The number of NUSI values in the cylinder(s) is counted
- The table row count is divided by the NUSI row number calculated in the previous step; the result is the rows per NUSI value.
The random AMP sample of a NUSI can be used by the Teradata Optimizer to decide if, in a full table scan, the base table should be utilized or the NUSI.
Teradata is not using random AMP sampling for a USI. The Optimizer always assumes that the number of distinct values of the USI equals the number of table rows.
Dynamic AMP Sampling is not a substitute for collected statistics. You can read the details here: