Introduction to Teradata Dynamic AMP Sampling
Teradata calculates them at runtime for indexed columns (PI, USI, NUSI) without statistics. However, these statistics only contain some key figures, such as the table cardinality and distinct values. This process is called dynamic AMP sampling or earlier also dynamic AMP sampling. Dynamic AMP samples are stored in the table header of each AMPs FSG cache.
For each table, a random AMP is chosen. It is determined by calculating each database table’s row hash of the internal TABLE_ID. The well-known hashing algorithm is used to select the target AMP:
CALCULATE ROWHASH(TABLE_ID) -> HASHMAP(ROWHASH) -> AMP
The restriction to determining some essential key figures is necessary because dynamic AMP samples have to be created during the execution plan (i.e., quickly) if they are not already available.
By default, dynamic AMP sampling is performed on a single AMP. This behavior can be changed: 2 AMPs, 5 AMPs, all AMPs of a node, or all-AMPs of all nodes can be used for sampling. To do this, changes have to be made in the system settings.
In the DBC control area, we can change the field “RandomAmpSampling.” The following settings are possible:
D…use the default 1 AMP sampling
L.. use 2 AMPs for sampling
M…use 5 AMPs for sampling
N…use all-AMPs belonging to one node for sampling
A…use all available AMPs in the system for sampling
However, it should be noted that with the number of sampling AMPs, the time needed to create the statistics also increases. This can be a disadvantage for queries that typically return a result within a few seconds, like tactical queries, but is usually not a problem with strategic workloads. When tables are skewed, more than one sampling AMP should be considered, as this gives the Optimizer a better picture of the data demographics.
From our experience with various Teradata installations, it only makes sense to choose between one-AMP sampling or ALl-AMP sampling. Everything in between may improve the execution plans but selecting the correct number of sampling AMPs is too complex and often does not bring the desired success. All-AMP sampling has the advantage that, like collected statistics, histograms are used to store detailed key figures.
Teradata always generates dynamic AMP samples for indexes even when statistics are collected. These can be used to detect stale statistics. Dynamic AMP Sampling is done in detail like this:
- If the table header is unavailable in the FSG cache, a dynamic AMP sample of the index is taken and distributed to all AMPs.
- The related DBC tables are checked to see if collected statistics are available
- If no collected statistics are available, the dynamic AMP sample is used for estimations
- The dynamic AMP sample is compared against the collected statistics if the collected statistics are available.
- If the difference between collected statistics and random AMP samples is above a certain threshold, the dynamic AMP sample will
be used. Otherwise, the collected statistics will be used. The parsing engine (PE) can extrapolate statistics by considering the object use counts (OUC). Details you can read here:
The Sampling Process for PI, USI, NUSI
Depending on the settings, these steps will be done by one or several AMPs at the same time:
- Each sampling AMP reads 1 – 2 cylinders into the FSG cache from the disk.
Usually, it’s two cylinders (the first and the last belonging to the table). But if the table fits entirely into one cylinder, only one cylinder must be read.
- The average number of rows of the cylinder’s data blocks is calculated
- The average number of rows is multiplied by the number of data blocks
- The above result is multiplied by the number of cylinders belonging to the table
- The above result is multiplied by the number of AMPs available in the entire system
The result of the above calculations is estimating the table’s row count.
Teradata uses a similar approach when doing random AMP sampling for a NUSI:
- Each sampling AMP reads 1 – 2 cylinders from disk into the FSG cache
- The number of NUSI values in the cylinder(s) is counted
- The table row count is divided by the NUSI row number calculated in the previous step; the result is the rows per NUSI value.
The Teradata Optimizer can use the random AMP sample of a NUSI to decide if the base table should be utilized in a full table scan or the NUSI.
Teradata is not using random AMP sampling for a USI. The Optimizer always assumes that the number of distinct values of the USI equals the number of table rows.
Dynamic AMP Sampling is not a substitute for collected statistics. You can read the details here: