Unneccessary IO elimination – Teradata View Optimization

0
206

GUEST POST BY ARTEMIY KOZYR

Summary

Today I am going to show you how to identify the source problem to tune the performance of query utilizing the unnecessary amount of IO, what UII indicator is, and how WHERE clause placed on a VIEW is evaluated.

This case shows that the WHERE clause placed on VIEW for 2 Partitioned tables resulted in excessive block reads and applying WHERE condition afterward. You need to be fully aware of the amount of data query reads and the amount of data query really needs.

Performance tuning boosted the performance 5 fold. The query is completed 4-5 times faster and utilizes 4-5 times fewer system resources:

Teradata View Optimization

1+ Billion rows INSERTED

Here is the target query for today which populates large fact table of the data mart.

– INSERT INTO … SELECT FROM …

– The query is run on a daily basis

– Over 1000M rows inserted (1 billion)

INSERT INTO PRD3_1_db_dmcrmuakb.t_crm_ft_campaign_membership  (         …        ) SELECT  … FROM    PRD3_1_db_dmcrmuakb.v_crm_ft_campaign_membership WHERE   (crm_wave_launch_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 )))  /* Today – 1 month */     AND  (crm_campaign_enter_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 ))) /* Today – 1 month */ ;

Poor performance detected

Query performance statistics stored in Teradata QueryLog signifies that the query is suspicious and performing badly in terms of unnecessary IO, excessive Spool Spool space and CPU utilization.

Take a look at Teradata performance metrics:

Teradata View Optimization

What knowledge can we discover from this data?

What is particularly wrong and how can we tune and improve the query?

A closer look at IO, source view DDL

First of all, let us describe what UII (Unnecessary IO indicator) is and what it can tell us.

UII is calculated as SumIO / (SumCPU * 1000). The metric is used to make an idea about query efficiency for consuming CPU and IO resources. If UII is relatively high then it could mean that many data blocks are read but a relatively small proportion of them is actually processed.

Let us examine source View DDL through which the target table is populated.

REPLACE VIEW SBX.V_CRM_FT_CAMPAIGN_MEMBERSHIP AS LOCKING ROW ACCESS SELECT     … FROM SBX.T_CRM_CAMPAIGN_MEMBER CCM     JOIN SBX.T_MD_CRM_CAMPAIGN  CP          ON  (             …             AND CP.WAVE_LAUNCH_DATE>=OADD_MONTHS(CURRENT_DATE,-6)             AND  CCM.CAMPAIGN_ENTER_DATE >=OADD_MONTHS(CURRENT_DATE,-6));

A-ha! Here we see that latest 6 month are extracted from the view.

The tables are partitioned by columns used in WHERE clause:

CREATE MULTISET TABLE SBX.T_CRM_CAMPAIGN_MEMBER ,FALLBACK ,      NO BEFORE JOURNAL,      NO AFTER JOURNAL,      CHECKSUM = DEFAULT,      DEFAULT MERGEBLOCKRATIO      (                …         ) PRIMARY INDEX ( MEMBER_ID , WAVE_ID ) PARTITION BY RANGE_N(CAMPAIGN_ENTER_DATE  BETWEEN DATE ‘2014-12-01’ AND DATE ‘2025-12-31’ EACH INTERVAL ‘1’ MONTH );

But only last month’s data is inserted in the target table! The INSERT statement is as:

INSERT INTO PRD3_1_db_dmcrmuakb.t_crm_ft_campaign_membership  (         …        ) SELECT  … FROM    PRD3_1_db_dmcrmuakb.v_crm_ft_campaign_membership WHERE   (crm_wave_launch_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 )))  /* Today – 1 month */     AND  (crm_campaign_enter_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 ))) /* Today – 1 month */ ;

So, the query simply reads 5 months data in vain with no particular reason. The thing is that before final query applies any WHERE clause, the whole data is read.

It means every data block under this VIEW is read first, and then non-relevant data is eliminated. But why read unnecessary data? Let us figure out how to force Teradata extract only relevant partitions.

Here is the answer:

1. We might put the WHERE clause from the final query INSIDE the source VIEW DDL.

2. We might create an entirely new VIEW for this ETL process in case if there are some other users who might not want any changes to the VIEW.

Improved query performance

This optimization gave us 5 times boost in performance. The query is completed 4-5 times faster and utilizes 4-5 times fewer system resources.

Detailed query metrics below:

Teradata View Optimization

 

Artemiy Kozyr is Data Engineer at Sberbank, Moscow with Master Degree in CS.
He has 5 years of Experience in Data Warehousing, ETL, Visualization for Financial Institutions.

Contacts:

[email protected]
http://linkedin.com/in/artemiykozyr/

Our Reader Score
[Total: 8    Average: 3.5/5]
Unneccessary IO elimination – Teradata View Optimization written by Artemiy Kozyr on October 29, 2018 average rating 3.5/5 - 8 user ratings

LEAVE A REPLY

Please enter your comment!
Please enter your name here