Introduction to Teradata Dynamic AMP Sampling
Teradata calculates dynamic AMP samples for indexed columns (PI, USI, NUSI) at runtime without requiring statistics. These samples provide key information, including table cardinality and distinct values. They are stored in the FSG cache of each AMP’s table header. This process is referred to as dynamic AMP sampling.
A random AMP is selected for each table based on the row hash of its internal TABLE_ID, using a well-known hashing algorithm.
CALCULATE ROWHASH(TABLE_ID) -> HASHMAP(ROWHASH) -> AMP
Restricting the determination of essential key figures is necessary to quickly create dynamic AMP samples during the execution plan if already not present.
Dynamic AMP sampling is initially limited to one AMP, but it can be altered. The system settings allow for selecting 2, 5, or all AMPs within a node or all AMPs within all nodes for sampling.
The “RandomAmpSampling” field can be modified in the DBC control area. The available options include:
D…use the default 1 AMP sampling
L.. use 2 AMPs for sampling
M…use 5 AMPs for sampling
N…use all AMPs belonging to one node for sampling
A…use all available AMPs in the system for sampling
More sampling AMPs mean increased time for statistical creation, which can hinder the speed of tactical queries. However, this is generally not a concern for strategic workloads. When tables are skewed, multiple sampling AMPs should be utilized to give the Optimizer a more accurate representation of the data demographics.
Based on our experience with Teradata installations, choosing either one-AMP or All-AMP sampling is advisable. The intermediate options may enhance execution plans, but determining the proper number of sampling AMPs is typically complicated and does not yield the desired results. All-AMP sampling offers the advantage of utilizing histograms to retain precise key figures, similar to how collected statistics operate.
Teradata always generates dynamic AMP samples for indexes to detect stale statistics, even if statistics have already been collected. The process of Dynamic AMP Sampling is as follows:
- If the table header is unavailable in the FSG cache, a dynamic AMP sample of the index is taken and distributed to all AMPs.
- The related DBC tables are checked to see if collected statistics are available
- If no collected statistics are available, the dynamic AMP sample is used for estimations
- The dynamic AMP sample is compared against the collected statistics if the collected statistics are available.
- The dynamic AMP sample will be used if the difference between collected statistics and random AMP samples is above a certain threshold. Otherwise, the collected statistics will be used. The parsing engine (PE) can extrapolate statistics by considering the object use counts (OUC). Details you can read here:
The Sampling Process for PI, USI, NUSI
These steps may be executed simultaneously by one or multiple AMPs based on their configurations.
- Each sampling AMP reads 1 – 2 cylinders into the FSG cache from the disk.
Usually, it’s two cylinders (the first and the last belonging to the table). But if the table fits entirely into one cylinder, only one must be read.
- The average number of rows of the cylinder’s data blocks is calculated
- The average number of rows is multiplied by the number of data blocks
- The above result is multiplied by the number of cylinders belonging to the table
- The above result is multiplied by the number of AMPs available in the entire system
The calculations above estimate the row count of the table.
Teradata employs a comparable method for conducting random AMP sampling for a NUSI.
- Each sampling AMP reads 1 – 2 cylinders from the disk into the FSG cache.
- The number of NUSI values in the cylinder(s) is counted.
- The table row count is divided by the NUSI row number calculated in the previous step; the result is the rows per NUSI value.
The Teradata Optimizer can utilize a random AMP sample of a NUSI to determine whether to perform a full table scan on the base table or the NUSI.
Teradata does not use random AMP sampling for a Unique Secondary Index (USI). The Optimizer assumes that the number of distinct values in the USI equals the number of rows in the table.
Dynamic AMP Sampling cannot replace collected statistics.