Tactical Query Optimization with Sparse Maps in Teradata 16.10
Sparse Maps are a new feature which comes with Teradata 16.10 which allows us placing all rows of small tables on one AMP or on one AMP per Node. These new type of maps are defined within the hash maps (their parent maps) which are used to distribute table rows by primary index value across all AMPs.
Teradata 16.10 includes automatically one single-AMP sparse map, and one multi-AMP sparse map covering one AMP per Node in the system configuration. But of course, you can define additional ones. Each sparse map can contain an unlimited number of small tables.
The reason to make use of sparse maps for small tables is to limit work to one or a few AMPs, avoiding all-AMP steps.
Assume a 500-AMP system. Even if our small table only contains 3 rows, each AMP has to do the following tasks in order to check for existing rows:
- Get an AMP worker task
- Read the table header
- Reserve memory
- One physical IO has to be done, even if there is no row on this AMP!
While this doesn't sound too dramatic, remember that the system could be congested, and an AMP not even holding any row might be in flow control. This would delay the completion of the access step.
By using the new sparse maps feature of Teradata 16.10, all-AMP steps can easily be changed into single-AMP steps for small tables. Coming back to our 500 AMP system, 499 AMPs would now not need to get an AMP worker task.
Furthermore, we get rid of the physical IO for 499 AMPs and reduce congestion risk. If we access our small table which is distributed by the sparse map, it might even qualify to be handled as priority (tactical) workload.
Typical candidates to be moved to the sparse map are tables which have fewer rows than AMPS and are accessed frequently and stay small over time.
It is recommended to move only tables to the sparse map which are less than 128KB in size. For a multi-AMP sparse map, we would multiply the 128KB with the number of Nodes (remember, multi-amp sparse maps contain one AMP per node). On a 4 node system, this would be tables up to 4+128KB = 512KB.
Please be careful when looking for good candidates for the sparse map. Empty tables seem to be a great choice to be moved, but you have to ensure that they really stay empty and are not just stage tables which will be repopulated with millions or billions of rows in your next batch load process.
Sparse maps might be useful to replace some existing techniques described here:
A great article by David Wellman compares the different ways of handling small tables in this article: