fbpx

In a normalized PDM, relation tables are often difficult to query. We have to choose precisely one Primary Index as a query path for direct accesses. The other side of the relationship typically needs a full table scan. Using a Teradata join index can help improve the performance of queries. Below is an example table. We have chosen the column LeftId as the primary index:

CREATE MULTISET TABLE DWHPRO.TheRelationshipTable
(
    LeftId BIGINT NOT NULL,
    RightId BIGINT NOT NULL,
    RelationTypeCd CHAR(10)
PRIMARY INDEX ( LeftId )
PARTITION BY 
;

While we can access the Single-AMP Primary Index for a query via LeftId, the query via RightId requires a full table scan as shown below:

SELECT * FROM DWHPRO.TheRelationshipTable WHERE RightId = 1000;
EXPLAIN SELECT * FROM DWHPRO.TheRelationshipTable WHERE RightId = 1000;

1) First, we lock DWHPRO.TheRelationshipTable in TD_MAP1 for
read on a reserved RowHash in all partitions to prevent global deadlock.
2) Next, we lock DWHPRO.TheRelationshipTable in TD_MAP1 for read. 
3) We do an all-AMPs RETRIEVE step in TD_MAP1 from 
DWHPRO.TheRelationshipTable by way of an all-rows scan with a condition of ("DWHPRO.TheRelationshipTable.RightId = 10000") into Spool 1 (group_amps), which is built locally on the AMPs. The size of Spool 1 is estimated with high confidence to be 9 rows (1,611 bytes).  The estimated time for this step is 0.08 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of statement 1. The total estimated time is 0.08 seconds.

A simple solution to having direct access to both sides of the relationship is to create a join index. It is recommended to keep the join index as slim as possible. Therefore it is recommended to implement a join back to the base table via the ROWID as shown in the example below (instead of adding all columns into the index):

CREATE JOIN INDEX DWHPRO.JI_RELATIONSHIP
AS
SELECT
    (RightId,RelationTypeCd),
    (ROWID)
FROM
    DWHPRO.TheRelationshipTable
PRIMARY INDEX (RightId);

As the associated execution plan below shows, the join index is used. First, its row is accessed via PI lookup. Then, the base tables rows are joined by joining back over the ROWID stored in the Join Index:

Explain SELECT * FROM DWHPRO.TheRelationshipTable
WHERE RightId = 10000;
 
1) First, we do a single-AMP RETRIEVE step from
DWHPRO.JI_RELATIONSHIP by way of the primary index
"DWHPRO.JI_RELATIONSHIP.RightId = 10000" with no
residual conditions into Spool 2 (group_amps), which is
redistributed by the rowkey of (DWHPRO.JI_RELATIONSHIP.Field_1038) to few AMPs in TD_Map1.
Then we do a SORT to order Spool 2 by the sort key in spool field1. The size of Spool 2 is estimated with high confidence to be 9 rows (450 bytes). The estimated time for this step is 0.00 seconds.
2) Next, we do a group-AMPs JOIN step in TD_Map1 from Spool 2 (Last Use) by way of an all-rows scan, which is joined to
DWHPRO.TheRelationshipTable by way of an all-rows scan with
no residual conditions.  Spool and DWHPRO.TheRelationshipTable are joined using a rowkey-based
row id join, with a join condition of ("Field_1 =   DWHPRO.TheRelationshipTable.ROWID"). The result goes into
Spool 1 (group_amps), which is built locally on the AMPs. The
size of Spool 1 is estimated with low confidence to be 9 rows (
1,611 bytes).  The estimated time for this step is 0.05 seconds.
3) Finally, we send out an END TRANSACTION step to all AMPs involved in processing the request. 
-> The contents of Spool 1 are sent back to the user as the result of statement 1.  The total estimated time is 0.05 seconds.

The new execution plan is already an enormous improvement over the initial one; The optimizer can efficiently query and join the relationship from both sides (RightId and LeftId). What we have additionally done in our example is to compress the join index. Compression makes sense if the indexed side of the relationship has one or more biased values. With this approach, we can prevent to create a skewed join index and additionally save storage space.

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
Teradata Book Query Performance Tuning
Buy the Book Teradata Query Performance Tuning

DWH Pro Admin

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>