How to use the Teradata MAPS Architecture?

In an earlier article (link to All Rows on One AMP – By Design), I showed an alternative design for small reference tables, ensuring that all rows in the table are stored on a single Teradata AMP using the Teradata MAPS architecture. The design allows for single-AMP retrieval of the entire table, usually before joining, which uses fewer AWTs.

During a recent Teradata education class that I was teaching, a student commented that most, if not all, of their reference tables, were too large to use that approach because they all contain history rows (effectively, they are slowly changing dimensions). I was asked if my proposed design could be extended for slightly larger reference tables.

This article discusses four design options I’ve investigated due to those discussions. 

During this investigation and testing, I also started working on Teradata 16.10, which includes the “Multiple Hash Maps” (MHM) feature, which provides some options for the designs included here. The Teradata MAPS architecture allows us to use additional hash maps to distribute rows according to different needs. We also show alternatives for Teradata versions smaller than 16.10 without using the Teradata MAPS architecture.

Three Scenarios– Small, Medium, and Large with or without Teradata MAPS architecture

While investigating and testing this requirement, I quickly realized that the design needs to cater to the reference tables’ different sizes. For this article, I’ve described these as ‘Small’, ‘Medium’, and ‘Large’, with the definitions of these terms shown below:

SmallThe entire table fits into 1 or 2 data blocks (typically 1 – 2 MB of PERM space).

 

I suspect that such tables will often not contain any history of changes. The table only holds ‘current rows’.

MediumAll rows (both ‘current’ and ‘history’) fit into a few data blocks, probably less than one cylinder (@11MB).

 

The ‘current rows’ fit into 1 or 2 data blocks (typically 1 – 2 MB of PERM space).

LargeThe ‘current rows’ fit into 1 or 1 data block (typically 1 – 2 MB of PERM space), but the ‘history rows’ can grow as large as required (to meet business requirements).

What is the requirement?

This design is based on the following requirements I have encountered at some customer sites. They may not exactly match yours, but I hope they are similar.

  • The table optionally contains ‘current’ rows or ‘history’ rows.
  • The table contains ‘start date’ and ‘end date’ columns that define when the row’s data is effective.
  • A specific value indicates the current row in the ’end date’ column. In a Teradata system, this is often “NULL”, “3500-12-31” or “9999-12-31”.
  • Typical queries using this table only require ‘current’ rows.

What are the new design options?

With the requirements stated above, my suggested design depends on two factors:

  1. Is the reference table ‘Small’, ‘Medium’, or ‘Large’.
  2. What version of the Teradata DBMS is being used? The critical factor is “< TD 16.10” or “>= TD 16.10”. The Teradata MAPS architecture requires at least Teradata Version 16.10.

Having determined where your table fits regarding the two factors shown above, use the following table to determine which design is best for you. The designs are explained in detail after this table, and examples are given.

Teradata version < TD 16.10

Teradata version >= TD 16.10 using the Teradata MAPS architecture

Small

Add an extra column to the table definition, which is the PI for the table. All rows have the same value for this column. (Design#1)Create the table with a ‘natural’ definition (UPI on the code column) and assign it to a 1-AMP Hash Map (Design#2). This requires the availability of the Teradata MAPS architecture.

Medium

(See below – ‘Large table’)Create the table using a NUPI on the code column, and add Row Partitioning using the ‘current row’ column (“End date” for this article) as the PPI column. Assign the table to a 1 AMP Hash Map (Teradata MAPS architecture)

 

Define the partitioning scheme so that all ‘current’ rows go into a single partition.

(Design#3)

Large

Create the table with a ‘natural’ definition (UPI on the code column).

 

Create a sparse Single Table Join Index (STJI) with a WHERE clause that includes the ‘current’ row data value.

Note: It is critical that the WHERE clause is not an equality constraint against this ‘current’ row data value but includes it.

Note: The ‘current’ row data value must be non-null.

(Design#4)

The following section shows examples of each of the four designs.

Design#1: Small table, < TD 16.10 (without Teradata MAPS architecture)

The table definition is as shown in my previous article on this subject (link to All Rows on One AMP – By Design):

CREATE SET TABLE small_ref_table(picol INTEGER NOT NULL,ref_code CHAR(2) NOT NULL,ref_desc VARCHAR(200))PRIMARY INDEX(picol);

With the above definition in place, a typical query that joins to this table now includes (in the view definition) selection against the Primary Index column. So the relevant part of the plan becomes:

2) Next, we do a single-AMP RETRIEVE step from small_ref_table by way of the primary index “small_ref_table.picol = 1” with no residual conditions into Spool 2 (all_amps), which is duplicated on allAMPs. The size of Spool 2 is estimated …

Design#2: Small table,>= TD 16.10 using the Teradata MAPS architecture

This table definition uses a ‘natural’ Primary Index but the TD16.10 MHM feature to ensure that all primary data rows are assigned to a single AMP.

Note: the actual map name (“TD_1AmpSparseMap_1Node” in the example below) may be different on your system).

CREATE SET TABLE small_ref_table,MAP = TD_1AmpSparseMap_1Node(ref_code CHAR(2) NOT NULL,ref_desc VARCHAR(200))UNIQUE PRIMARY INDEX(ref_code);

With the above definition in place, a typical query that joins to this table now includes (in the view definition) selection against the Primary Index column. So the relevant part of the plan becomes:

4) We do a single-AMP RETRIEVE step from small_ref_table by way of an all-rows scan with no residual conditions into Spool 3 (all_amps), which is duplicated on all AMPs in TD_Map1. The size of Spool 3 is estimated …

Design#3: Medium table,>= TD 16.10 using the Teradata MAPS architecture

This table definition keeps the ‘code’ column as the Primary Index but adds in row partitioning on the ‘end_date’ column (the one used to identify the ’current’ row). The table now contains ‘history’ rows, and so the PI is defined as non-unique.

Note: the actual map name (“TD_1AmpSparseMap_1Node” in the example below) may be different on your system).

CREATE TABLE medium_ref_table,MAP = TD_1AmpSparseMap_1Node(ref_code CHAR(2) NOT NULL,ref_desc VARCHAR(200),start_date DATE NOT NULL,end_date DATE NOT NULL — uses 2099-12-31 as ‘current row indicator value’PRIMARY INDEX(ref_code)PARTITION BY RANGE_N(end_date BETWEEN DATE ‘2010-01-01’ AND DATE ‘2020-12-31’EACH INTERVAL ‘1’ DAY,NO RANGE);

With the above definition in place, a typical query that joins this table now includes (in the view definition) selection against the partitioning column so that only ‘current’ rows are returned, only requiring one partition to be accessed. The relevant part of the plan becomes:

4) We do a single-AMP RETRIEVE step from a single partition ofmedium_ref_table with a condition of (“medium_ref_table.end_date= DATE ‘2099-12-31′”) with a residual condition of(“medium_ref_table.end_date = DATE ‘2099-12-31′”) into Spool 3(all_amps), which is duplicated on all AMPs in TD_Map1. The size of Spool 3 is estimated …

Note: a slight alternative to the above design is to use a NULL value in column ‘end_date’ to identify the ‘current’ row. If this is your requirement, change the design to include the special ‘UNKNOWN’ partition (instead of ‘NO RANGE’) and change the selection criteria to “WHERE end_date IS NULL”.

Design#4: Large table (all TD releases), Medium table < TD 16.10 without Teradata MAPS architecture

This design uses a sparse, single table join index (STJI) to give us single AMP access to the required data for ’current’ rows. The optimizer uses the STJI to cover the query partially.

CREATE TABLE large_ref_table(ref_code CHAR(2) NOT NULL,ref_desc VARCHAR(200),start_date DATE NOT NULL,end_date DATE NOT NULL — uses 2099-12-31 as ‘current row indicator value’UNIQUE PRIMARY INDEX(ref_code);

CREATE JOIN INDEX large_ref_table_stjiASSELECT end_date,ref_code,ref_desc,start_dateFROM large_ref_tableWHERE end_date BETWEEN DATE ‘2099-12-30’ AND DATE ‘2099-12-31’PRIMARY INDEX (end_date);

Note: It is critical that the WHERE clause in the STJI definition is not an equality constraint against the ‘current’ row data value but includes it.

Note: The ‘current’ row data value must be non-null. If you use NULL as the ’current’ row data value, you will not get a single AMP retrieved from the STJI. This is because of a specific query re-write feature in Teradata.

With the above definition in place, a typical query that joins this table now includes (in the view definition) selection against the ‘end_date’ column so that only ‘current’ rows are returned. The optimizer uses the STJI instead of the table. The relevant part of the plan becomes:

3) We do a single-AMP RETRIEVE step from LARGE_REF_TABLE_STJIby way of the primary index “LARGE_REF_TABLE_STJI.end_date= DATE ‘2099-12-31′” with no residual conditions into Spool 3(all_amps), which is duplicated on all AMPs in TD_Map1. The size of Spool 3 is estimated …

An additional benefit of this design, which may prove useful, is that your base table (“large_ref_table” in my example) can contain additional columns. For instance, you could have other columns which record:

  • The name of the user that initially created each row (CREATE_USERNAME).
  • The date and time when a row was created (CREATE_TS).
  • The name of the user that last updated each row (LASTALTER_USERNAME).
  • The date and time when a row was last updated (LASTALTER_TS).

If you include these columns in your base table design, do not include them in the STJI definition unless most queries also need them (remember that the intent is to have all real data rows for the STJI fit into 1 or 2 data blocks).

When Can I (And When Can I Not) Use This Design?

Unlike the original design I described (‘Design#1’ above), this approach can be used against any size table reference. You are no longer constrained by overall table size, only the size of the ‘current’ rows.

Here are some guidelines to help you when thinking about how and where to use the designs from this article:

  • Add the appropriate WHERE clause into the base view that references the table.
  • The aim is to reduce the resources used for processing; this will not noticeably change performance, although you may find that a frequently referenced table (or at least the ‘current’ rows) will sit in memory on the Node.
  • If you use Design#1 for some tables, do not use the same data value for column PICOL on all tables. Doing so would potentially overload one AMP. Use different values (perhaps the table name?) to ensure the table rows are assigned to other AMPs.
  • If you use Design#2 and Design#3 for some tables, do not use the same value for the COLOCATE USING clause on all tables. Doing so would potentially overload one AMP. I suggest you not code the COLOCATE USING clause to ensure the table rows are assigned to different AMPs.

Summary

This design approach will NOT make SQL requests run noticeably faster. However, it will make them run more efficiently because they use fewer AWTs, CPU cycles, and IO operations to achieve the same result. If possible, use the Teradata MAPS architecture with a 1-AMP map.

Since you have read this article, you might also be interested in the following article in which the maps feature is described in detail:

David Wellman

David Wellman is the Technical Director of Ward Analytics Ltd, a UK company specialising in Teradata performance analysis and management. David has been using the Teradata products for over 20 years and regularly delivers Teradata training courses and performance consulting engagements. David is a Teradata Certified Master at V2R3, V2R5, TD12 and TD14.

More details at: http://www.ward-analytics.com

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>