Sign up for the free newsletter. Only subscribers will receive exclusive free offers such as useful queries, stored procedures, and tools.



The Teradata Forum

Discuss with other Teradata experts in our new forum

Home Blog Page 3

Teradata Multiple Joins – A Query Tuning Showcase

1

GUEST POST BY ARTEMIY KOZYR

1. Summary

The purpose of this showcase is to demonstrate the way of optimization statement with multiple JOINs. The elegant way of tuning which leads Teradata Optimizer to the optimal JOIN strategy and using data redistribution instead of duplication where appropriate.

Whenever you have complex logic with lots of joins inside try to decompose and identify parts of it performing badly. Gather Execution plan, track query performance and resource usage to determine the optimal JOIN sequence to achieve the best performance.

The results are just overwhelming. Performing JOINs the optimal way helped save resource utilization by several orders:

In several cases, it turns out to be too complicated to tune the query with lots of joins and logic inside, so it might be useful to decompose it into several consequent steps. It then gets easier in terms of monitoring performance, transparency, execution plan.

2. Heavy load Statement with lots of JOINs inside

Here is the real example of the query running in our production environment:

  • Heavy query run daily which consumes lots of resources;
  • Takes place at ETL Staging area (pre-detailed layer);
  • Source system entries row are enriched with surrogate keys and ETL metadata;

Inside there are 12 x LEFT JOINs take place.

The cumulative staging table (425M+ rows) is JOINed to Surrogate Keys tables (the largest is 125M+ rows).

Take a look at the source code:

LOCKING ROW FOR ACCESS

SELECT
_S.*
SK_ACC_DEV.TGT_ID,
SK_PTY_CLIENT.TGT_ID,
SK_AGRMNT.TGT_ID,
SK_PTY_VSP.TGT_ID,
SK_PTY_OSB.TGT_ID,
SK_PROD.TGT_ID,
SK_ACC_DEV_REASON_TYPE.TGT_ID,
SK_DRN.TGT_ID,
SK_UCFC.TGT_ID,
SK_CONTR_SUBTYPE.TGT_ID,
SK_CARD_BIN.TGT_ID,

FROM SANDBOX.S0160000010008_CARD _S

LEFT JOIN SANDBOX.K_AGRMNT_NK07 AS _SK_AGRMNT
ON (_SK_AGRMNT.NK07_AGREEMENT_ID_W4 IS NOT NULL
AND _SK_AGRMNT.NK07_AGREEMENT_ID_W4=_S.CONTRACT_ID_W4 )

LEFT JOIN SANDBOX.K_PTY_NK12 AS _SK_PTY_CLIENT
ON (_SK_PTY_CLIENT.NK12_WAY4_CLIENT_ID IS NOT NULL
AND _SK_PTY_CLIENT.NK12_WAY4_CLIENT_ID=_S.CARD_CLIENT_ID_W4 )

LEFT JOIN SANDBOX.K_PROD_NK03 AS _SK_PROD
ON (_SK_PROD.NK03_PRODUCT_ID_W4=_S.PRODUCT_ID_W4 )

LEFT JOIN SANDBOX.K_ACCESS_DEVICE_NK10 AS _SK_ACC_DEV
ON (_SK_ACC_DEV.NK10_CARD_NUMBER IS NOT NULL
AND _SK_ACC_DEV.NK10_CARD_NUMBER=_S.CARD_NUMBER )

LEFT JOIN SANDBOX.S0160000010001_BRANCH _B
ON (_S.OSB_TB_NUMBER=_B.TB_OLD_NUMBER
AND _S.OSB_NUMBER=_B.OSB_NUMBER )

LEFT JOIN SANDBOX.K_PTY_NK13 AS _SK_PTY_VSP
ON _SK_PTY_VSP.NK13_VSP_FULL_NUMBER = _B.TB_NUMBER

LEFT JOIN SANDBOX.K_PTY_NK11 AS _SK_PTY_OSB
ON SK_PTY_OSB.NK11_TB_NUMBER = B.TB_NUMBER

LEFT JOIN SANDBOX.K_ACCESS_DEVICE_RSN_TYPE_NK01 AS _SK_ACC_DEV_REASON_TYPE
ON (_SK_ACC_DEV_REASON_TYPE.NK01_W4_STATUS_ID=_S.STATUS_ID
AND _SK_ACC_DEV_REASON_TYPE.NK01_W4_STATUS_ID IS NOT NULL)

LEFT JOIN SANDBOX.K_ACCESS_DEVICE_CLASS_VAL_NK05 AS _SK_DRN
ON (_SK_DRN.NK05_DRN=_S.DRN
AND _SK_DRN.NK05_DRN IS NOT NULL )

LEFT JOIN SANDBOX.K_ACCESS_DEVICE_CLASS_VAL_NK03 AS _SK_UCFC
ON (_SK_UCFC.NK03_UCFC=_S.UCFC
AND _SK_UCFC.NK03_UCFC IS NOT NULL )

LEFT JOIN SANDBOX.K_CONTR_SUBTYPE#WAY4_NK01 AS _SK_CONTR_SUBTYPE
ON (_SK_CONTR_SUBTYPE.NK01_CONTR_SUBTYPE_ID=_S.CONTR_SUBTYPE_ID
AND _SK_CONTR_SUBTYPE.NK01_CONTR_SUBTYPE_ID IS NOT NULL )

LEFT JOIN SANDBOX.K_ACCESS_DEVICE_CLASS_VAL_NK04 AS _SK_CARD_BIN
ON (_SK_CARD_BIN.NK04_CARD_BIN=_S.CARD_BIN
AND _SK_CARD_BIN.NK04_CARD_BIN IS NOT NULL );

3. Poor performing query needs to be reviewed

 

The query has shown poor performance indicators and tends to be among top badly performing statements according to Query Log.

Recent weeks it has only grown in resource consumption, so it needs to be reviewed and tuned.

The resources are consumed non-optimal way. This single session utilizes 40-45% CPU time of the whole ETL process of this source system!

Take a look at the query performance:

4. Examine the execution plan

It is always the tricky question how can we tune the query and improve the performance. What should we start with?

First of all, we should start with digging deeper into the Execution Plan. Let us try to identify a part of the query which is performing badly.

Here we see the JOIN to clients surrogate key table. The table of SK consists of 125M+ rows and data is redistributed evenly across all AMPs. Yet we see that join is performed with data duplication on all AMPs!

10) We do an all-AMPs RETRIEVE step from a single partition of
SANDBOX._SK_PTY_CLIENT with a condition of … into Spool 26 (all_amps)
which is duplicated on all AMPs with hash fields …

 Then we do a SORT to order Spool 26 by row hash. The result spool file will not be cached in memory. The size of Spool 26 is estimated with no confidence to be 230,896,636,992 rows (10,852,141,938,624 bytes). The estimated time for this step is 16 minutes and 22 seconds. 

5. Approach: Statistics, JOIN INDEX,  Sequence of JOINs

  1. Statistics gathering

First of all, make sure proper STATS are in place. Should we gather necessary data if there are some statistics missing on the joining fields?

COLLECT STATISTICS COLUMN (PARTITION
, INFO_SYSTEM_INST_CD
, INFO_SYSTEM_TYPE_CD
, NK12_WAY4_CLIENT_ID)
ON SANDBOX.K_PTY_NK12 ;

HELP STATISTICS SANDBOX.K_PTY_NK12;

HELP STATISTICS SANDBOX.S016_0008_CARD;

But still, Teradata Optimizer decides to perform table duplication and does not change the execution plan.

  1. Introducing JOIN INDEX

 

Should we proceed to create Join Index – the Teradata-specific way of materializing views?

I have tried a couple of different JI definitions, but unfortunately, Teradata Optimizer would not use it due to specific reasons. Furthermore, it would add significant overhead, so it might not be the best option.

CREATE JOIN INDEX SANDBOX.JI_CC0160000010008_CARD
AS
SELECT
_S.CARD_ID_W4
, _S.CARD_CLIENT_ID_W4
, _SK_PTY_CLIENT.NK12_WAY4_CLIENT_ID
, _SK_PTY_CLIENT.TGT_ID
FROM SANDBOX.S0160000010008_CARD /*SANDBOX.S016_0008_CARD*/ _S
INNER JOIN SANDBOX.K_PTY_NK12 AS _SK_PTY_CLIENT
ON (_SK_PTY_CLIENT.NK12_WAY4_CLIENT_ID IS NOT NULL
AND _SK_PTY_CLIENT.NK12_WAY4_CLIENT_ID=_S.CARD_CLIENT_ID_W4 );

  1. JOIN sequence manipulation

Finally, let us try to force a particular sequence of joining tables and see the results. So how do you do this? You can simply decompose this large query into two consequent ones:

1) First, perform all the JOINs except one causing bad performance

2) JOIN intermediate result to the last large table

How is the sequence of joins determined? It depends on Teradata cost-based optimization. There are no HINTs like you have in Oracle to force the optimizer to execute the statement the way you want.

By dividing the initial statement into two consequent ones we force Teradata the specific join order. This approach would bring the best results.

So, you simply comment out the JOIN condition and add it in the second step:

/* LEFT JOIN SANDBOX.K_PTY_NK12 AS _SK_PTY_CLIENT
ON (_SK_PTY_CLIENT.NK12_WAY4_CLIENT_ID IS NOT NULL
AND _SK_PTY_CLIENT.NK12_WAY4_CLIENT_ID=_S.CARD_CLIENT_ID_W4) */

6. Improved performance with data redistribution

Here is the new EXPLAIN Plan excerpt for the second step of query utilizing data redistribution and performing of Hash JOIN:

 3) We do an all-AMPs RETRIEVE step from a single partition of SANDBOX._SK_PTY_CLIENT with a condition of … into Spool 25 (all_amps) Spool 25 is redistributed by hash code to all AMPs The size of Spool 25 is estimated with low confidence to be 127,257,826 rows. The estimated time for this step is 0.58 seconds. 11) We do an all-AMPs JOIN step from Spool 2, which is joined to Spool 23 Spool 25 is used as the hash table and Spool 23 is used as the probe table in a right outer joined using a classical hash join of 2 partitions with a join condition of … The size of Spool 28 is estimated with low confidence to be 387,595,663 rows ( 815,113,679,289 bytes). The estimated time for this step is 16.71 seconds.  

The results are just great. The Performance Indicators improved by multiple times unlocking the whole Teradata capacity.

Here are the performance indicators BEFORE and AFTER optimization. I have grouped 2 consequent steps under one number for the convenience.

Artemiy Kozyr is Data Engineer at Sberbank, Moscow with Master Degree in CS.
He has 5 years of Experience in Data Warehousing, ETL, Visualization for Financial Institutions.

Contacts:

[email protected]
http://linkedin.com/in/artemiykozyr/

Boosting Retail Banking operations with improved JOIN performance and QUALIFY clause

0

Executive summaryGUEST POST BY ARTEMIY KOZYR

Today I shed some light on how Data Warehousing lies at the core of Retail Banking operations. We will see the real case of vital marketing process malfunction, dive deep under the surface level to comprehend technical issues of data alchemy.

You will learn how to approach such issues, see the big picture, identify flaws and address them the best possible way. Auditing Execution plan, grasping Performance Indicators, understanding Data Distribution, refactoring and benchmarking Join Operations, reducing unnecessary IO, applying Analytical Functions.

As a result, treatment resulted in boosting critical Data Mart calculation by x60 times, so our business users get their data in time on a daily basis. Thus, you can truly understand why delivering value really matters. As usual, feel free to post any comments and questions below. I will be glad to answer and discuss it with you.

Data Warehousing vs. Retail Banking

Hi guys! Today I am going to talk about improving and accelerating vital Marketing operations in Retail Banking.

With all the disruptive threats coming from FinTech startups, modern Banks strive to become Data Driven. This means utilizing the whole potential of existing data in order to make meaningful and reliable business decisions.

This moves us to the core need of IT and Business working closely, understanding each other’s needs and capabilities and thus creating a synergetic effect.

The key process steps here are:

  • EXTRACT raw data
  • TRANSFORM and ENRICH data in order to CREATE knowledge
  • DELIVER value to Business Management
  • so they can derive conscious INSIGHTS and make useful DECISIONS
  • and do it on a TIMELY basis

We will barely scratch the surface of the process of preparing cross-channel marketing campaigns in Retail Banking, its core activities, departments, data and Information Systems involved into it, but we dive deep into a particular technical realization of one of the steps residing in Teradata Data Warehouse.

Sometimes one small part can cause malfunction of the whole system. Descending into how it works beneath the surface level, analyzing technical details, system capabilities, and bottlenecks might be extremely interesting and rewarding.

The Big Picture and the overall process scheme looks like this:

  • Gather source data/customer interaction experience
  • Extract lots of customer markers/flags
  • Map with client’s master data (identity, contacts, etc.)
  • Enrich with CRM / Transactions / Scoring historical data
  • Segment database / calculate metrics
  • Prepare cross-channel marketing campaigns

Thumbs up and give me any sort of feedback, if you would like to hear some more about real case studies of applied DWH (data warehousing) / Data Cloud Platforms inside Banking from a Business perspective.

Things have gone wrong – Bottlenecks and Malfunction

A couple of days ago business users faced with TLA (Time-Level Agreement) violation. The necessary business data was not ready by 9 AM in the morning. That consequently created further bottlenecks and delays and resulted in overall process malfunction.

Spheres affected by the issue:

  • Business Impact: TLA violation – creates a bottleneck for subsequent operations
  • Tech impact: Excessive System Resource Usage (thus cannibalizing resource from other routines)

The problem must be addressed as soon as possible, so IT team initiated the investigation. The closer look resulted in a query posing a serious issue inside Teradata Data Warehouse. The amount of data was rising gradually, but the average time to process it increased significantly at one point in time.

In terms of speed, it takes around 50-60 minutes daily for this one particular query to proceed.

The basic question we want to ask is what is the purpose of this query? What does it do?

Here is a brief description in business terms:

  1. Transfer customer interaction experience data from staging area.
  2. Map customer to CRM and Payment Card Processing systems through MDM (Master Data Management).
  3. Remove duplicates; take the customer’s actual ID only.

Technically speaking, a number of joins with some grouping and filtering applied.

Eventually, this non-optimal query needs serious treatment. We will have to find and analyze core causes, implement a solution and test it in order to improve performance.

A top-down approach to address the problem

First of all, we need to understand how to approach this kind of problem in general.

There should be an effective and comprehensive plan to take everything into account, see the big picture, identify flaws and address them the best possible way. What would this approach look like?

In order to find the particular reasons, fix them and improve the situation we need to take several steps:

  • Analyze the source code
  • Gather historical performance metrics
  • Examine the execution plan and identify core problems
  • Investigate the actual data, its demographics, and distribution
  • Propose a viable solution to the issue
  • Implement and test it, assess results

Understanding the query purpose

First of all, we need to make sure we understand the purpose of every query from a business perspective. There should be no excessive or irrelevant data participating in the query unless it complies with business needs.

Let us take a deeper look at the source code of this particular query. I have left some comments down below so you can grasp the general purpose of every stage.

INSERT INTO SANDBOX.gtt_cl_nps_trg_1 (
client_mdm_id,
event_type_id,
event_date,
load_date,
id_target,
load_dt,
tag_v21,
name_product,
tag_v22,
tag_v23)
SELECT
cpty_2.src_pty_cd AS client_mdm_id,
t1.event_type_id,
t1.date_trggr AS event_date,
t1.load_date AS load_date,
t1.id_target AS id_target,
v_curr_dt AS load_dt,
t1.tag_v21,
t1.name_product,
t1.tag_v22,
t1.tag_v23
FROM
(
SELECT
cc.load_date,
cc.trggr_dt AS date_trggr,
cc.client_dk AS id_target,
cc.load_dt AS date_load,
/* 3.2. taking latest party key according to MDM system: */
Max(cpty_1.pty_id) AS pty_id_max,
event_type_id,
cc.tag_v21,
cc.name_product,
cc.tag_v22,
cc.tag_v23
FROM
/* 1. Gathering raw data: */
(SELECT
ncd.load_date,
sbx.load_dt,
sbx.trggr_dt,
sbx.client_dk,
eti.event_type_id,
CASE
WHEN eti.event_type_id IN (’22’, ’23’, ’24’, ’25’, ’26’, ’27’, ’28’, ’29’) THEN sbx.tag_v21
ELSE Cast(NULL AS VARCHAR(200))
end AS tag_v21,
sbx.name_product,
sbx.tag_v22,
sbx.tag_v23
FROM SANDBOX.v_snb_xdl_dm_fvs_nps_sbx sbx
INNER JOIN SANDBOX.gtt_nps_calc_date ncd
ON ( sbx.load_dt = ncd.load_date )
INNER JOIN (
/* … additional data joins … */
) eti
) cc
/* 2. Enriching data with client master key – party external reference: */
LEFT JOIN SANDBOX.v_prd017_REF_PTY_EXTRNL_PTY ref
ON (Cast(cc.client_dk AS VARCHAR(40)) = ref.pty_host_id
AND ref.mdm_system_id=1201)
LEFT JOIN SANDBOX.v_prd017_pty cpty_1 ON ref.pty_id=cpty_1.pty_id
LEFT JOIN SANDBOX.v_prd017_pty_stts stts ON cpty_1.pty_id=stts.pty_id
WHERE stts.pty_stts_id=1401
AND stts.pty_stts_end_dt = DATE’9999-12-31′
/* 3. taking latest party key according to MDM system: */
GROUP BY cc.load_date,
cc.trggr_dt,
cc.client_dk,
cc.load_dt,
cc.event_type_id,
cc.tag_v21,
cc.name_product,
cc.tag_v22,
cc.tag_v23
) t1
/* 3.1. leave one row only with max(party_id) */
LEFT JOIN SANDBOX.v_prd017_pty cpty_2
ON t1.pty_id_max=cpty_2.pty_id
;

All in all, the following steps are performed:

  1. Getting raw data from view v_snb_xdl_dm_fvs_nps_sbx
  2. Enriching data with client master key – party external reference
  3. Removing duplicates – leaving latest entry – max(party_id)
  4. Inserting result set to be processed on next steps

Audit Execution Plan and Historical Performance

The best way to analyze historical data on query performance is to use PDCR (Performance Data Collection and Reporting) database and its visually engaging version – Teradata Viewpoint.

Each query ever executed on RDBMS is stored with crucial metadata such as query text, user, start time, end time, resources consumed, and any errors if occurred.

So let us dig into some details from PDCR info database as well as Teradata Viewpoint Portlet.

What do the performance indicators say?

First of all, according to PDCR database query utilizes an enormous amount of resources in comparison to the rest of other queries of this particular DataMart:

  1. CPU Time utilization ~ 160k CPU sec
  2. Spool Usage ~ 15 TB
  3. Total IO count ~ 55kk
  4. It takes around ~ 55 minutes to process on average

A tremendous amount of memory and CPU seconds are used to store and compute intermediate query results. Further steps take far fewer resources to process. So there are certainly several problems that need to be addressed.

Secondly, here are some pictures from Teradata Viewpoint Query Spotlight portlet:

We can see here the same performance metrics in a more visually appealing way.

The next step would be an examination of the actual EXECUTION plan and discovering which steps (operations) result in the heaviest load. According to the Explain tab in Teradata Viewpoint Portlet the most expensive step is one particular JOIN which took about ~37 minutes (~90% of the time):

On this step the following actions are performed:

  1. Joining the table of master keys T_REF_PTY_EXTRNL_PTY with the number of actual rows around 319 billion.
  2. Duplication of these 319 billion of rows to every AMP on the system.

Comprehend Data Demographics

As we identified the heaviest operation during that query, let us find out more about data being joined.

Table T_REF_PTY_EXTRNL_PTY is used to accumulate customers’ keys in different banking systems, so they could be matched afterward on the following steps.

That might help to find out more about customer’s profile, loans, expenses, preferences, behavior patterns. This kind of knowledge helps Bank offer personalized products and services at an appropriate time and situation.

We need to answer a number of simple questions:

  • How many rows are there in a table?
  • How many of them do we really need?
  • How selective is this data by specific columns?
  • Which data types are used and why?
  • How the JOIN is performed?
  1. The table definition goes like this:

CREATE MULTISET TABLE SANDBOX.T_REF_PTY_EXTRNL_PTY , NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
mdm_system_id BIGINT NOT NULL ,
pty_host_id VARCHAR(40) CHARACTER SET Unicode NOT CaseSpecific NOT NULL,
pty_id BIGINT NOT NULL,
deleted_flag CHAR(1) CHARACTER SET Unicode NOT CaseSpecific NOT NULL DEFAULT ‘N’,
action_cd CHAR(1) CHARACTER SET Unicode NOT CaseSpecific NOT NULL,
workflow_run_id BIGINT ,
session_inst_id INTEGER ,
input_file_id BIGINT ,
info_system_id SMALLINT,
pty_extrnl_pty_id BIGINT NOT NULL,
pty_host_id_master_cd CHAR(1) CHARACTER SET Unicode NOT CaseSpecific
PRIMARY INDEX ( pty_id )
INDEX t_ref_pty_extrnl_pty_nusi_1 ( pty_extrnl_pty_id );

So basically, we see that pty_host_id column (customer’s id in different source systems) is VARCHAR(40) type. That is due to the fact that some systems store customer’s identity with additional symbols like ‘_’, ‘-’, ‘#’, etc. That is why the whole column cannot be stored as a numeric data type.

2. The data distribution by source systems goes like this:

What is most important, we only need a portion of this data! Note that in source query only one specific source system is mapped with the following filtering clause:

WHERE    mdm_system_id=1201 

So we don’t need the whole data from the table. We only need around 20% of the data.

Moreover, this portion of data is easily cast to numeric data type, which is joined way easier than joining on a string field. In one of the next chapters, I will show you some benchmarking and comparison of different join types.

  1. Statistics of data for the optimizer

Make sure statistics are present and up-to-date on tables and column participating in a query. Statistics is one of the most crucial aspects on which the execution plan is based. Lack of statistics or stale stats may result in Teradata choosing wrong execution plan and throwing an error, for example ‘No more Spool Space’.

Discover how statistics are collected on a certain table with SHOW command:

SHOW STATS ON SANDBOX.v_prd017_REF_PTY_EXTRNL_PTY ;

COLLECT STATISTICS
— default SYSTEM SAMPLE PERCENT
— default SYSTEM THRESHOLD PERCENT
COLUMN ( deleted_flag,pty_host_id_master_cd,pty_id ) ,
COLUMN ( mdm_system_id,deleted_flag,pty_host_id ) ,
COLUMN ( deleted_flag,pty_host_id_master_cd ) ,
COLUMN ( pty_id ) ,
COLUMN ( deleted_flag ) ,
COLUMN ( pty_host_id_master_cd ) ,
COLUMN ( mdm_system_id ) ,
COLUMN ( pty_host_id ) ,
COLUMN ( mdm_system_id,pty_id ) ,
COLUMN ( pty_extrnl_pty_id )
ON SANDBOX.T_REF_PTY_EXTRNL_PTY ;

To see the actual stats values and date of gathering use HELP command:

HELP STATS ON SANDBOX.v_prd017_REF_PTY_EXTRNL_PTY;

Addressing core problems

 

Taking everything into account, in order to improve the query performance we have to achieve several goals:

  1. Retrieve only necessary portion of data from largest table prior to joining.
  2. JOIN on numeric data wherever possible.
  3. Eliminate the duplication of a colossal amount of data on all AMPs.
  4. Use QUALIFY command to exclude unnecessary JOIN, GROUP BY and applying WHERE (filtering data).

Step #1. Joining the right and easy way

First of all, let us place WHERE clause to filter rows we want to JOIN prior to joining itself, as we only need around 20% of the whole table.

Secondly, let us cast this data to the numeric data type (BIGINT) as it will boost the performance significantly.

Below I am providing BEFORE and AFTER version of source code:

/* BEFORE STATEMENT */

LEFT JOIN SANDBOX.v_prd017_REF_PTY_EXTRNL_PTY ref
ON (Cast(cc.client_dk AS VARCHAR(40)) = ref.pty_host_id
AND ref.mdm_system_id=1201)

/* AFTER STATEMENT */

LEFT JOIN
(SELECT
PTY_HOST_ID
, Cast(Trim(PTY_HOST_ID) AS BIGINT) AS PTY_HOST_ID_CAST
, pty_id
FROM SANDBOX.T_REF_PTY_EXTRNL_PTY
WHERE mdm_system_id=1201
) ref ON cc.id_target = ref.pty_host_id_cast

As a result, we will see that these simple steps will improve the overall query performance drastically as well as make Teradata optimizer choose data redistribution strategy instead of duplication (which takes a tremendous amount of time).

Furthermore, I have done some detailed testing to benchmark different kinds of JOIN operation:

  • BIGINT = BIGINT
  • CAST(VARCHAR as BIGINT) = BIGINT
  • CAST(BIGINT as VARCHAR) = VARCHAR

Table with performance metrics is below:

As you can see, the best type of JOIN is on equality of numeric data types (e.g. BIGINT). This kind of join consumes 4 times fewer CPU seconds and almost 2 times less IO operations as casting data types and comparing strings sign by sign requires far more CPU resources overhead. Joining on string fields also requires 25-30% more Spool space.

Step #2. QUALIFYing to remove duplicates

Instead of excessive joining, grouping and applying aggregate function, we may use Teradata specific function to qualify rows we need. This syntax will help us execute query more efficient and intelligent way. Let us see how we can do this:

/* BEFORE STATEMENT */

/* 3.3. taking latest party key according to MDM system: */
Max(cpty_1.pty_id) AS pty_id_max,

/* 3.1. taking latest party key according to MDM system: */
GROUP BY cc.load_date,
cc.trggr_dt,
cc.client_dk,
cc.load_dt,
cc.event_type_id,
cc.tag_v21,
cc.name_product,
cc.tag_v22,
cc.tag_v23

/* 3.2. leave one row only with max(party_id) */
LEFT JOIN SANDBOX.v_prd017_pty cpty_2
ON t1.pty_id_max=cpty_2.pty_id

/* AFTER STATEMENT */

QUALIFY Row_Number() Over (PARTITION BY cc.load_date,
cc.trggr_dt,
cc.client_dk,
cc.load_dt,
event_type_id,
cc.tag_v21,
cc.name_product,
cc.tag_v22,
cc.tag_v23
ORDER BY cpty_1.pty_id DESC
) = 1

Applying QUALIFY clause enables us to leave only one actual data entry per client. To learn some more about QUALIFY clause, Row_Number() as well as other ordered analytical functions please refer to web resources.

Assessing the results

As a result of this analysis, the next step is to implement this new source code, deploy it on testing and production environments and assess the results.

First of all, how has the execution plan changed?

  • The number of actual rows processed has decreased by 2k times! From 319b to 157m.
  • The actual time of this step has fallen to 26 seconds compared to 70 minutes before (by 160 times)
  • We have got rid of duplication of all rows to all AMPs using redistribution of this chunk of data instead.

See the actual performance metrics below:

  • Total CPU time required for processing has fallen by 60 times due to the fact we ensure the best possible way of joining and accessing data
  • We have saved around 50m IO operations with data blocks by not processing unnecessary data
  • Consequently, the amount of Spool Space used to store intermediate results has decreased by 55 times from 14 TB to 250 GB.
  • Most importantly, our query now takes only around 1 minute to process.

Notice how we managed to improve crucial performance indicators, system resources utilization and what is the most important overall time required to process data.  This particular query boosted our Data Mart calculation for almost 60 minutes, so our business users get their data in time on a daily basis.

Artemiy Kozyr is Data Engineer at Sberbank, Moscow with Master Degree in CS.
He has 5 years of Experience in Data Warehousing, ETL, Visualization for Financial Institutions.

Contacts:

[email protected]
http://linkedin.com/in/artemiykozyr/

Unneccessary IO elimination – Teradata View Optimization

0

GUEST POST BY ARTEMIY KOZYR

Summary

Today I am going to show you how to identify the source problem to tune the performance of query utilizing the unnecessary amount of IO, what UII indicator is, and how WHERE clause placed on a VIEW is evaluated.

This case shows that the WHERE clause placed on VIEW for 2 Partitioned tables resulted in excessive block reads and applying WHERE condition afterward. You need to be fully aware of the amount of data query reads and the amount of data query really needs.

Performance tuning boosted the performance 5 fold. The query is completed 4-5 times faster and utilizes 4-5 times fewer system resources:

Teradata View Optimization

1+ Billion rows INSERTED

Here is the target query for today which populates large fact table of the data mart.

– INSERT INTO … SELECT FROM …

– The query is run on a daily basis

– Over 1000M rows inserted (1 billion)

INSERT INTO PRD3_1_db_dmcrmuakb.t_crm_ft_campaign_membership  (         …        ) SELECT  … FROM    PRD3_1_db_dmcrmuakb.v_crm_ft_campaign_membership WHERE   (crm_wave_launch_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 )))  /* Today – 1 month */     AND  (crm_campaign_enter_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 ))) /* Today – 1 month */ ;

Poor performance detected

Query performance statistics stored in Teradata QueryLog signifies that the query is suspicious and performing badly in terms of unnecessary IO, excessive Spool Spool space and CPU utilization.

Take a look at Teradata performance metrics:

Teradata View Optimization

What knowledge can we discover from this data?

What is particularly wrong and how can we tune and improve the query?

A closer look at IO, source view DDL

First of all, let us describe what UII (Unnecessary IO indicator) is and what it can tell us.

UII is calculated as SumIO / (SumCPU * 1000). The metric is used to make an idea about query efficiency for consuming CPU and IO resources. If UII is relatively high then it could mean that many data blocks are read but a relatively small proportion of them is actually processed.

Let us examine source View DDL through which the target table is populated.

REPLACE VIEW SBX.V_CRM_FT_CAMPAIGN_MEMBERSHIP AS LOCKING ROW ACCESS SELECT     … FROM SBX.T_CRM_CAMPAIGN_MEMBER CCM     JOIN SBX.T_MD_CRM_CAMPAIGN  CP          ON  (             …             AND CP.WAVE_LAUNCH_DATE>=OADD_MONTHS(CURRENT_DATE,-6)             AND  CCM.CAMPAIGN_ENTER_DATE >=OADD_MONTHS(CURRENT_DATE,-6));

A-ha! Here we see that latest 6 month are extracted from the view.

The tables are partitioned by columns used in WHERE clause:

CREATE MULTISET TABLE SBX.T_CRM_CAMPAIGN_MEMBER ,FALLBACK ,      NO BEFORE JOURNAL,      NO AFTER JOURNAL,      CHECKSUM = DEFAULT,      DEFAULT MERGEBLOCKRATIO      (                …         ) PRIMARY INDEX ( MEMBER_ID , WAVE_ID ) PARTITION BY RANGE_N(CAMPAIGN_ENTER_DATE  BETWEEN DATE ‘2014-12-01’ AND DATE ‘2025-12-31’ EACH INTERVAL ‘1’ MONTH );

But only last month’s data is inserted in the target table! The INSERT statement is as:

INSERT INTO PRD3_1_db_dmcrmuakb.t_crm_ft_campaign_membership  (         …        ) SELECT  … FROM    PRD3_1_db_dmcrmuakb.v_crm_ft_campaign_membership WHERE   (crm_wave_launch_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 )))  /* Today – 1 month */     AND  (crm_campaign_enter_dt >=  (TD_SYSFNLIB.OADD_MONTHS (DATE , -1 ))) /* Today – 1 month */ ;

So, the query simply reads 5 months data in vain with no particular reason. The thing is that before final query applies any WHERE clause, the whole data is read.

It means every data block under this VIEW is read first, and then non-relevant data is eliminated. But why read unnecessary data? Let us figure out how to force Teradata extract only relevant partitions.

Here is the answer:

1. We might put the WHERE clause from the final query INSIDE the source VIEW DDL.

2. We might create an entirely new VIEW for this ETL process in case if there are some other users who might not want any changes to the VIEW.

Improved query performance

This optimization gave us 5 times boost in performance. The query is completed 4-5 times faster and utilizes 4-5 times fewer system resources.

Detailed query metrics below:

Teradata View Optimization

 

Artemiy Kozyr is Data Engineer at Sberbank, Moscow with Master Degree in CS.
He has 5 years of Experience in Data Warehousing, ETL, Visualization for Financial Institutions.

Contacts:

[email protected]
http://linkedin.com/in/artemiykozyr/

Turning on the Teradata Locking Logger

0

Teradata Locking LoggerTurning on the Teradata Locking Logger
by Vasudeva Rao

 

First, we enable locking logger via the DBS console window or the cnsterm subsystem, if it is not enabled already.The setting does not take effect until the database is restarted. The following is the dbscontrol help output for the flag which controls locking logger:

9. LockLogger – This Field defines the system default for the locking logger. This allows the DBA to log the delays caused by database locks to help in identifying lock conflicts. To enable this feature set the field to TRUE. To disable the feature set the field to FALSE.

If LockLogger is already set to TRUE in DBScontrol, then this is not necessary and “dumplocklog” can be executed as explained further below.

The following is how to use the cnsterm subsystem from the UNIX command prompt on the PDN node (root privilege is required):

# cnsterm 6
> start dbscontrol

[break] <=== use the key defined as intr (interrupt a.k.a. break) in the output of stty -a   ( Control Key + C)

# cnsterm 1
> display general <=== if flag 9 (LockLogger) is already set to true, you do not need to modify it.

[ output follows]

> modify general 9=true
> write
> quit;

[break]

If the LockLogger flag has been altered from ‘false’ to ‘true’, a database restart must occur for the change to take effect. There are no ill effects to having the flag set to true while waiting for a good time to restart the database. The following shows how to issue a TPARESET command from the UNIX command line and the cnsterm subsystem.

# cnsterm 6

> query state

TPA is in the state: Logons are enabled – The system is quiescent <=== verify there are no requests in progress which will require a long rollback

> restart tpa dump=no coldwait enable locking logger

# rallsh -sv /usr/pde/bin/pdestate
<< check for TPA >>>>

After a database restart with the LockLogger flag set to true, the Locking Logger will begin to accumulate lock information into a circular memory buffer of 64KB.

Depending on how frequently the system encounters lock contention, this buffer will wrap, but it will usually span several day periods. Following a period of lock contention, to analyze the lock activity, you need to run the dumplocklog utility which moves the data from the memory buffer to a database table where it can be accessed.

Using “dumplocklog” to retrieve information that was locked by LockLogger:

As with dbscontrol, dumplocklog is available at the DBS console and cnsterm. We show the procedure for cnsterm:

# cnsterm 6
> start dumplocklog
[break]

# cnsterm 1
Enter your logon string: <username,password> (Q/q to quit)
> systemfe, service

Do you want this utility to run continuously? (Y/N)

>n <=== see the note below regarding the possible impact of continuous operation.

Enter number of sessions (? For Help, Q/q to Quit):
>2 <=== see the note below regarding sessions
2 sessions will be logged on.

Enter the Character Set (? For Help, Q/q to Quit):
> 127

…You have chosen character set <ASCII >
Enter the table name where the lock log entries will be written (? For Help, Q/q to Quit):
> locklog

Do you want this utility to create table “SYSTEMFE”.”LOCKLOG”? (Y/N)
> y

Table “SYSTEMFE”.”LOCKLOG” has been created.

Enter the time constraint to control selection of lock log entries that were generated AT or LATER than this time YYYYMMDD HHMMSS

(? For Help, Q/q to Quit):
> 20001116 120000

Enter the time constraint to control the selection of lock log entries that were generated AT or PRIOR TO this time YYYYMMDD HHMMSS

(? For Help, Q/q to Quit):
> 2001117 235959

Enter the object constraint for selection of lock log entries (? For Help, Q/q to Quit):

> *     # * for all objects or <database>.<tablename> for specific objects. Type ? for help on syntax options

> [return] Writing lock log entries to table “SYSTEMFE”.”LOCKLOG”.

Press <F2> any time to stop the utility.

### At this point it may take some time until a message like below appears.

> 2,616 rows have been inserted to table “SYSTEMFE”.”LOCKLOG”.

*** DumpLockLog is terminated ***

The data now resides in the database table you requested and allows all the query flexibility of any table. Please refer to the manual for the complete table definition. However, we offer the following as a complete, plain vanilla output from the table:

/* Generate a report based on the log output listing the users in lock conflict */

.set width 65531
.export file locklog.out
LOCKING ROW FOR access
select
begdate (format ‘mm/dd’) (named “Date”)
,begtime (format ’99:99:99′) (named “Time”)
,delay (format ‘999:99:99.9’) (named “Delay”)
,a.username (format ‘X(18)’) (named “BlockingUser”)
,trim(dbase.databasename) ||’.’||trim(tvm.tvmname) (named “DBandTable”)
,stmttype  (named “StatementType”)
,processor ,blkingsessno (format ‘z(9)999’) (named “BlockingSession”)
,blkinglevel (named “BlockingLevel”)
,blkingmode (named “BlockingMode”)
,b.username (format ‘X(18)’) (named “BlockedUser”)
,blkdsessno (format ‘z(9)999’) (named “BlockedSession”)
,blkdlevel (named “BlockedLevel”)
,blkdmode (named “BlockedMode”)
,deadlock
,multipleblocker (named “MultipleBlocker”)
/* change FROM clause so it points to your lock log table */
from
systemfe.locklog
left outer join dbc.tv on  tid = tvm.tvmid
left outer join dbc.dbase on   dbid = dbase.databaseid
left outer join dbc.eventlog a on
(blkingsessno  = a.sessionno and   blkingloghost = a.logicalhostid and   begdate   <= a.datefld and   begdate  >= a.logondate and a.event = ‘logoff’)
left outer join dbc.eventlog b on  (blkdsessno  = b.sessionno and   blkdloghost = b.logicalhostid and   begdate  <= b.datefld and begdate >= b.logondate and b.event     = ‘logoff’)
order by 1,2,3 ;
.export reset

/* To find the number of outstanding locks for a given 10-minute period */

SELECT begdate(TITLE ‘Date’)
,((begtime / 1000)(INTEGER)) * 1000(FORMAT’99:99:99′)(TITLE ‘Time’)
,COUNT(*) (TITLE ‘# locks’)
FROM locklog
ORDER BY 1,2
GROUP BY 1,2;

For more complex analysis, refer to the Teradata Utilities Manual: Chapter 2 – Locking Logger Utility, Producing a Lock Log Report.

Additional Info/Comments:

Question:

A customer wants to activate Locking Logger during certain periods when they perceive potential locking issues. As an example, they want to have Locking Logger active and logging on Tuesday mornings from 7:00A to noon.

Outside of this period they want Locking Logger to be available but inactive (not logging). From their interpretation, they believe they have to do a TPA reset every time which is not acceptable to them. What is the recommended procedure?

Answer:

The recommended procedure is to enable the background facility once (which will require a tpareset). Once this is done, the locking activity will be written to a 64K-byte circular buffer of about 1500 entries per amp.

We recommend that you simply let it do this because the performance impact is negligible. It is not worthwhile to attempt to stop the internal logging process. We have a large number of customers that run permanently in this fashion without any noted degradation in response time or throughput.

Using the output of the circular buffer, generate Locking Log reports either periodically or continuously. I do not recommend that this be done continuously.

In terms of session overhead and table space the continuous mode can have a significant impact (see also the related question below). The recommended approach is to invoke the report generator immediately following the analysis period and to specify a conservative number of sessions (start with 1 per node and adjust it).

Question:

When continuous mode is chosen the area that controls the number of sessions goes gray and unclickable. The value in that field is 1, so I am unclear why locking logger in continuous mode is using one session per vproc. Does Teradata know of a way to force locking logger to use only one session total in continuous mode?

Answer:

We disabled the choice for selecting the number of sessions with the CONTINUOUS mode because by choosing the CONTINUOUS, the user basically lets LOCKING LOGGER takes over complete control.

The CONTINUOUS mode was originally designed similar to a background execution of the LOCKING LOGGER. The locking logger’s primary function is to take data from the lock buffer in memory and insert them into a table on disk. Because these inserts can be done in parallel, we thus allow multiple sessions to do the INSERT.

However these are not just any regular sessions, they are the so-called CONSOLE sessions. There is a limit of 6 console sessions per PE vproc. And depending on what’s running out there, sometimes there aren’t enough of these sessions per AMP (they get used up by utilities).

As the LOCKING LOGGER is just a utility, we cannot risk taking console sessions from other more important jobs. However, we don’t have the ability to adjust the number of sessions dynamically (i.e., once it is set), we have to choose a number based upon the situation at the activation time of the locking logger. This number is chosen such that it leaves enough sessions around for other utilities yet still allows sufficient parallelism for the locking logger’s insertion of rows.

We already know that this number might not make all users happy. And thus, we have advised the users NOT to use the CONTINUOUS mode if they could help it. Remember that the LOCKING LOGGER is a HISTORICAL recording tool, i.e., the data recorded could contain 6-month worth or 2-second worth, it does not justify running in CONTINUOUS mode.

Avatar

 

Tuning the Teradata LIKE Operator

0

Usually, queries which are using the LIKE operator are causing a full table scan.

If the LIKE operator matches from the left, the Teradata Optimizer can quickly search in the statistic histograms to get demographic data metrics such as the number of rows, etc.

Unfortunately, the Teradata Optimizer has no way to use the statistics for a LIKE match such as LIKE ‘%ABC%’. In this case, it will expect a pessimistic selectivity and always choose the full table scan as the least expensive access path. The full table scan can be done on the base table, a join index, or a NUSI.

Still, in the absence of additional qualifiers against other columns, no indexed access is possible.

As mentioned above, the Teradata Optimizer may choose to do the full table scan on a smaller NUSI (if the number of distinct values is high) or a Join Index.

But as this is not ensured, we can use a trick to enforce the full table scan on a smaller table.

Let’s assume that the big table we are accessing with the LIKE operator looks like this:

CREATE SET TABLE Table1
(
PK INTEGER NOT NULL
colA VARCHAR(100),
colB VARCHAR(100),
colC VARCHAR(100),

colZ VARCHAR(100)
) UNIQUE PRIMARY INDEX (PK)

Below is a typical query against Table1:

SELECT * FROM Table1 WHERE colA like ‘%ABC%’;

To avoid the full table scan of the big table, we create a second table, only containing the minimum set of columns required for the LIKE match (in our example we only need “colA”) plus the unique primary key column “PK”:

CREATE SET TABLE Helper
(
PK INTEGER NOT NULL,
colA  VARCHAR(100)
) UNIQUE PRIMARY INDEX (PK);

INSERT INTO Helper SELECT PK, colA FROM Table1;
COLLECT STATISTICS ON Helper COLUMN(PK);

Finally, we transform the original SQL statement:

SELECT Table1.*
FROM
Table1 t01
INNER JOIN
Helper t02
ON
t01.PK = t02.PK
WHERE Helper.colA like ‘%ABC%’;

Teradata will now do a full table scan on the smaller table “Helper”, and an AMP-local rowkey based merge join with the big table “Table1”.

Of course, the benefit is only given if scanning the big table is much more expensive than scanning the small table and doing a consecutive AMP-local merge join.

The Teradata Access Paths
Teradata Golden Tuning Tipps 2017 – Take your Skills to the next Level!
The Teradata Access Paths

Teradata Recursive Queries

0

Teradata Recursive Queries

Teradata SQL offers two methods to create a recursive query. We can either create a query using the WITH RECURSIVE clause or create a view using the CREATE RECURSIVE VIEW statement.

Each recursive query consists of 2 or 3 parts, depending on which of above approaches is used:

  • The seed statement is the initial query which is executed.
  • The recursive statement is the repeating query which is executed until no further rows are returned. To avoid infinite loops, we usually add a termination condition to the recurrent query. Without a termination condition, the user may run out of spool space.
  • The final query returns the result of the seed query and all iterations of the recursive query.

Creating a recursive query using the WITH RECURSIVE clause:

WITH RECURSIVE <query>
(<columns>) AS
(
<seed query>
UNION ALL
<recursive query>
)
<final query>;

Creating a recursive query using a RECURSIVE VIEW:

CREATE RECURSIVE VIEW <view>
(<columns>) AS
(
<seed query>
UNION ALL
<recursive query>
);

To get a better understanding of how recursive queries work on Teradata, I prepared a common problem we want to solve: Finding the shortest paths in a graph.

Shortest Path – The Recursive Solution

We will store all available legs in below table:

CREATE SET TABLE Leg
(
From_Id INTEGER NOT NULL,
To_Id INTEGER NOT NULL
)
PRIMARY INDEX (From_Id);

/* Example Graph */

INSERT into Leg VALUES (‘1′,’2’);
INSERT into Leg VALUES (‘2′,’1’);
INSERT into Leg VALUES (‘2′,’3’);
INSERT into Leg VALUES (‘2′,’7’);
INSERT into Leg VALUES (‘3′,’2’);
INSERT into Leg VALUES (‘3′,’4’);
INSERT into Leg VALUES (‘3′,’5’);
INSERT into Leg VALUES (‘3′,’6’);
INSERT into Leg VALUES (‘6′,’7’);
INSERT into Leg VALUES (‘6′,’3’);
INSERT into Leg VALUES (‘7′,’2’);
INSERT into Leg VALUES (‘7′,’6’);
INSERT into Leg VALUES (‘7′,’8’);
INSERT into Leg VALUES (‘7′,’9’);
INSERT into Leg VALUES (‘8′,’7’);
INSERT into Leg VALUES (‘9′,’7’);
INSERT into Leg VALUES (‘5′,’3’);
INSERT into Leg VALUES (‘4′,’3’);

The solution with a recursive query:

WITH RECURSIVE ThePath
(From_Id, To_Id, Path, TheLength) AS
(
SELECT
From_Id
,To_Id
,(TRIM(From_Id) || ‘,’ || TRIM(To_Id)) (VARCHAR(512)) AS Path
, 1 AS TheLength
FROM
Leg
WHERE
From_Id = ‘1’
UNION ALL
SELECT
ThePath.From_Id
,t01.To_Id
,ThePath.Path || ‘,’ || TRIM(t01.To_Id)
,ThePath.TheLength + 1 AS TheLength
FROM
Leg t01
INNER JOIN
ThePath
ON
t01.From_Id = ThePath.To_Id
WHERE POSITION(‘,’ || TRIM(t01.To_Id) || ‘,’ IN ‘,’ || ThePath.Path || ‘,’) = 0
— Above WHERE condition ensures that we do not revisit a node a second time!
AND ThePath.TheLength <= 100
— Avoid out of spool situations, put a fixed stop after 100 recursions!
)

/* Below statement ensures that if there are multiple routes between two nodes, one of
the minimum numbers of stops are chosen */

SELECT
From_Id,
To_Id,
Path,
TheLength
FROM ThePath
QUALIFY ROW_NUMBER() OVER (PARTITION BY From_Id, To_Id ORDER BY TheLength, Path) = 1
ORDER BY 1,4,3;

Here is the result set of the query, showing all minimum distance routes from node 1

FROM_ID TO_ID PATH TheLength
1 2 1,2 1
1 3 1,2,3 2
1 7 1,2,7 2
1 4 1,2,3,4 3
1 5 1,2,3,5 3
1 6 1,2,3,6 3
1 8 1,2,7,8 3
1 9 1,2,7,9 3

Shortest Path – The Non-Recursive Solution

We will now compare the recursive query with a solution written as Stored Procedure:

CREATE SET TABLE Route
(
From_Id INTEGER NOT NULL,
To_Id INTEGER NOT NULL,
Path VARCHAR(512),
TheLength INTEGER
) PRIMARY INDEX (From_Id);

REPLACE PROCEDURE ThePath()
DYNAMIC RESULT SETS
BEGIN
DELETE FROM ShortestPath;
INSERT INTO ShortestPath
SELECT
From_Id,
To_Id,
(TRIM(From_Hub_Id) || ‘,’ || TRIM(To_Hub_Id)) (VARCHAR(512)) AS Path,
1 AS TheLength
FROM Leg
WHERE From_id = ‘1’;

WHILE ACTIVITY_COUNT > 0 DO
INSERT INTO Route
SELECT
t02.From_Id
t01.To_Id
t02.Path || ‘,’ || TRIM(t01.To_Id)
t02.TheLength + 1
FROM Leg t01, Route t02
WHERE
t01.From_Id = t02.To_Id
AND t01.To_Id <> ‘1’
AND t02.TheLength =
(SELECT MAX(TheLength) FROM Route)
AND t01.To_Id NOT IN
(SELECT To_Id FROM Route)
AND B.TheLength < 200
QUALIFY ROW_NUMBER() OVER (PARTITION BY t01.To_Id ORDER BY t02.Path) = 1;
END WHILE;
BEGIN
DECLARE mycursor CURSOR WITH RETURN ONLY FOR
SELECT * FROM Route  ORDER BY 4, 3;
OPEN mycursor;
END;
END;

In general, it can’t be said which  solution is faster or slower. It depends in how data structures accessed look like.

In this specific example, the stored procedure consumes fewer IOs and CPU that the recursive query.

There are several reasons:

  • Our stored procedure keeps a routes table of all visited nodes, while the recursive query might revisit the same node several times.
  • The recursive query continues to iterate even after the shortest path between two nodes has already been found. Running additional recursive steps increases the spool usage quickly.
Questions?
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.
Roland Wenzlofsky
Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

6 Simple Tricks to optimize your Teradata Queries

1. Avoid multiple Joins to the same Table Each join means either a full table scan or index access....

The Primary Index Choice

Subscribe to our Newsletter

Join 4500+ subscribers receiving our free newsletter.
No spam, ever. Just great stuff.

In order to activate your subscription, check your email and click on the link!

x