Designing Small Reference Tables for Teradata: Storing All Rows on One AMP for More Efficient Queries

David Wellman

May 2, 2023

minutes reading time

When designing tables for Teradata, it is important to distribute the rows across all AMPs in the system evenly. For instance, on a 100-AMP system with 100,000 rows, the objective would be to allocate roughly 1,000 rows per AMP.

I agree with the design guideline for many tables in a Teradata system. Nevertheless, a specific set of tables can benefit from a different design approach, where all rows are stored on a single AMP. This may seem counter-intuitive in a Teradata environment, but I suggest reading on it.

This design approach applies to small reference tables, commonly called “lookup” tables, which contain only a few tens or hundreds of rows.

What is the new design?

The tables within this category incorporate a PICOL column, serving as the non-unique primary index (NUPI). This column comprises an identical data value in each table row, compelling all rows onto a single AMP and ordinarily into the same data block.

How does this help?

This design approach will not significantly enhance the speed of SQL requests.

The program will run more efficiently and achieve the same result using fewer CPU and IO resources.

What is going to change?

After implementing this design change, SQL queries referencing this table must include an extra selection criterion for column PICOL to obtain the desired data value. Typically, this will be incorporated into the underlying view referencing the table, requiring minimal alterations to the application code. The ETL code responsible for loading or updating this table must guarantee that the necessary value is inserted into column PICOL.

Having made those changes, the plan will use a single-AMP read of the table instead of an all-AMP read, thereby saving AWTs, CPU, and IO. The query provides the same functional result but uses fewer resources, making it more efficient.

The following two (partial) plans show the differences.

This first plan uses a conventional table design (table ‘dept’) whereby the rows are distributed as evenly as possible across the AMPs.

1) First, we lock a distinct WORKDB.”pseudo table” for read on a
RowHash to prevent global deadlock for WORKDB.dept.

2) Next, we lock a distinct WORKDB.”pseudo table” for read on a
RowHash to prevent global deadlock for WORKDB.emp.

3) We lock WORKDB.dept for read, and we lock WORKDB.emp for read.

4) We do an all-AMPs RETRIEVE step from WORKDB.dept by way of an
all-rows scan with no residual conditions into Spool 2 (all_amps),
which is duplicated on all AMPs. The size of Spool 2 is estimated
with high confidence to be 70 rows (7,280 bytes). The estimated
time for this step is 0.04 seconds.

This second plan uses the alternative table design (still on table ‘dept’) whereby all rows are on a single AMP.

1) First, we lock a distinct WORKDB.”pseudo table” for read on a
RowHash to prevent global deadlock for WORKDB.emp.

2) Next, we lock WORKDB.emp for read.

3) We do a single-AMP RETRIEVE step from WORKDB.dept by way of the
primary index “WORKDB.dept.PICOL = 1” with no residual conditions
into Spool 2 (all_amps), which is duplicated on all AMPs. The
size of Spool 2 is estimated with high confidence to be 70 rows (
7,280 bytes). The estimated time for this step is 0.03 seconds.

A few differences to note between these two plans:

The second plan accesses the table “by way of the primary index”. This is a result of the selection criteria added to the SQL request. This is one of the changes noted above.
The second plan does not require a table level. All-AMP locks on the table with the new design (it is now locked using a row hash lock).
Although not shown, the step that reads the table and duplicates it to all AMPs will use fewer AWTs. A duplication step requires two AWTs on each AMP (a total of 2 * #AMPs on the system), reading the data rows and sending them to all AMPs, and the second to receive the distributed rows. In the second plan, because only one AMP is used to read the rows, the number of AWTs used is 1 + #AMPs on the system.
On a 100-AMP system, the number of AWTs used for this step will be reduced from 200 to 101.
(Again not shown) is that where the ‘lookup’ table fits into a single data block, the original plan would typically require a single I/O on each AMP, but the new plan will be a single I/O on one AMP.

When Can I (And When Can I Not) Use This Design?

As with many performance design tips, you must use this in specific places. Here are some guidelines to help you:

Small tables that you know will not grow.
This design can cater to a few thousand rows (127.5KB block can handle @5200 250-byte rows, row length after compression).
Add the appropriate WHERE clause into the base view that references the table.
The aim is to reduce the number of AWTs used for processing; this will not change performance, although you may find that a frequently referenced table will sit in memory on the Node.
If you use this approach for many tables, do not use the same data value for column PICOL on all tables. Doing so would potentially overload one AMP. Use different values (perhaps the table name?) to ensure the table rows are assigned to different AMPs.

Take a brief survey and receive the book for free Get instant access

Share0

Tweet0

Share0

David Wellman

David Wellman is the Technical Director of Ward Analytics Ltd, a UK company specialising in Teradata performance analysis and management. David has been using the Teradata products for over 20 years and regularly delivers Teradata training courses and performance consulting engagements. David is a Teradata Certified Master at V2R3, V2R5, TD12 and TD14.

More details at: http://www.ward-analytics.com

Never thought of single Amp strategy for the lookup tables. Great article, Thanks !

nice article
learned something new
would try it definitely.
thanks for sharing information

Designing Small Reference Tables for Teradata: Storing All Rows on One AMP for More Efficient Queries

David Wellman

Fast multi-file export of Teradata query results using only Teradata SQL Assistant

The Teradata AMP Worker Task

Boost Your Teradata Performance – The Critical Role of NOT NULL Declarations

Optimizing Teradata SQL Queries by Avoiding Full Table Scans and Utilizing Secondary Indexes