Introduction to Teradata Sparse Maps
The Teradata hashing architecture and how AMPs are used to split tables for maximum parallelism evenly have been unchanged since the beginning. What is new is the ability to create your hash maps. Hash maps, as we usually know them, are also called contiguous maps. If you are not familiar with the details of how Teradata distributes the rows of tables, I recommend you read this article first and then come back:
Relatively new is the concept of Teradata sparse maps. They are used when we do not want to distribute a table over all AMPs. Unlike contiguous maps, sparse maps can be created by us almost arbitrarily. The goal of sparse maps is to keep tables on one or a few AMPs:
Sparse maps exist within a contiguous map. Teradata comes with two predefined sparse maps:
- Single AMP Sparse Map
- Sparse map with one AMP per Node
When should we use sparse maps?
Teradata sparse maps are used with small tables to prevent all-AMP access. Besides the better efficiency, this can also avoid waiting on a skewed system because of the all-AMP step, although Teradata should read-only rows of one or a few AMPs because all-AMP access also means that we have to wait for the slowest AMP. Sparse maps allow the optimizer to perform single-AMP operations with advantages such as fewer IOs, less vulnerability for congestion, etc.
We can therefore expect more stable response times. As you may have noticed, spare maps are especially suitable for tactical workloads due to the abovementioned properties. Single-AMP sparse maps allow the optimizer to prioritize our queries as a tactical workload. If you are interested in tactical workload tuning, I can recommend this article:
Teradata sparse maps are ideal for small tables. Especially if they have fewer rows than the system has AMPs. Using sparse maps for tiny tables also minimizes the risk of skewed dynamic AMP samples. In general, the small tables we query frequently are good candidates.
How are tables distributed inside the sparse map?
Unlike contiguous maps, the rows of different tables within a single-AMP sparse map are not all mapped to the same AMP. This would also overload this AMP. Instead, Teradata decides for each table on which AMP Teradata will manage it. The tables are distributed evenly across all nodes on the AMPs. The sparse map does not assign all rows to one AMP. Here the whole is graphically represented. There are three tables (each shown in a different color) held on three AMPs:
The presented design and the automatic distribution of small tables to different AMPs have advantages in even workload distribution. But there are also disadvantages that we have to consider – the main disadvantage. Even though two tables are in the same single-AMP sparse map, they can’t be AMP joined locally because the tables are on different AMPs. With contiguous maps, this is possible if the primary indexes of both tables match the joined columns. However, Teradata offers the possibility to force different tables to the same AMP(s) when creating a table by using the so-called colocation name (MAP = MySingleAMPMap COLLOCATE USING my_colocation_name).
As we have seen, sparse maps are an essential tool to improve the performance of small tables. Predominantly tactical workloads can benefit considerably from using sparse maps. Teradata provides automated capabilities in Viewpoint to find small tables and switch them to a sparse map. Teradata considers tables to be small if they are smaller than 128 kilobytes, but this might change in the future.
Can I use Teradata Sparse Maps for big tables?
There are some possibilities for using sparse maps also for large tables. In the following article, you will learn how this is possible, but we will also show you alternatives to sparse maps (e.g., if you are on an older Teradata system where this feature is not yet available):
Finally, I would like to refer you here to an article about tactical workload because that is where sparse maps are predominantly needed: