fbpx

1. Avoid multiple Joins to the same Table

Each join means either a full table scan or index access. In all your Teradata SQL queries, avoid multiple joins to the same table if possible. Often multiple joins can be bypassed by subqueries, volatile tables, or ordered analytic functions.

Here is an example of using a pivot in a subquery to prevent multiple access to the same table:

SELECT
	A.SUBSCRIBER_ID,
 	B.VALUE AS INCOMING,
 	C.VALUE AS OUTGOING
FROM 
	SUBSCRIBERS A
	LEFT JOIN  CALLS B ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID AND B.INOUT_CD  = 'IN'
	LEFT JOIN  CALLS C ON A.SUBSCRIBER_ID = C.SUBSCRIBER_ID AND C.INOUT_CD  = 'OUT'
WHERE
	B.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
	C.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
SELECT
	A.SUBSCRIBER_ID,
 	B.INCOMING,
 	B.OUTGOING
FROM 
	SUBSCRIBERS A
	LEFT JOIN  
	(
		SELECT 
			SUBSCRIBER_ID,
			MAX(CASE WHEN INOUT_CD  = 'IN' THEN VALUE END) AS AVN_OFF_BAL,
			MAX(CASE WHEN INOUT_CD  = 'OUT' THEN VALUE END) AS BAL_OFF_BAL
		FROM
			CALLS 
		WHERE
			CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
			AND INOUT_CD IN ('IN', 'OUT')
		GROUP BY 1
	) B
ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID;

2. Avoid functions in Equi-Join Conditions

Most functions prevent ROWHASH access from being used when applied to a join column. These types of transformations are a sign of poor data modeling or poor data quality. If you solve the cause, then there is no need to use functions in joins.

a.SUBSCRIBER_ID = TRIM(LEADING '0' FROM b.SUBSCRIBER_ID)

3. Avoid the usage of UNION

A simple UNION causes a sorting and deduplication of the rows. This is a very expensive process. If it is ensured that the result sets cannot provide overlapping results, UNION ALL should always be used.

Instead of: 

SELECT   CAST ('FIRST'  AS VARCHAR (15)), PK FROM TheTable
UNION 
SELECT   CAST ('SECOND'  AS VARCHAR (15)), PK FROM TheTable;

Use:

SELECT   CAST ('FIRST'  AS VARCHAR (15)), PK FROM TheTable
UNION ALL
SELECT   CAST ('SECOND'  AS VARCHAR (15)), PK FROM TheTable;

4. Avoid GROUP BY over Expressions

Avoid grouping over expressions. For performance reasons, better use the underlying columns for the expression as grouping criteria, e.g., if you need to group by concatenated columns, use them in GROUP BY and not concatenated columns.

SELECT  COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE 
GROUP BY 
COALESCE (SUBSCRIBER_ID,0) || COALESCE(DESCRIPTION ,'');

Change to:

SELECT  COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE 
GROUP BY 
COALESCE (SUBSCRIBER_ID,0) , COALESCE(DESCRIPTION ,'');

5. Divide large Teradata SQL queries into smaller ones

A large single query can hit resource limits, especially SpoolSpace. To avoid this, try to split the query into smaller chunks. Good splitting points are UNION (ALL) statements. Multiple insert statements can replace these. Another splitting point could be JOINs.
Consider materializing complex sub-queries in a volatile table first.

6. Consider the creation of additional statistics and indexes

It’s all about statistics.
Optimization of SQL statements makes absolutely no sense if we do not have proper statistics defined.

But: Proper statistics does not mean we have to collect statistics at all.

We can let the optimizer do this as the optimizer always samples statistics on indexes if they are not explicitly defined.
Still, we have to ensure that the optimizer has for each table and column the correct picture about data demographics.

The first you should do is turn on diagnostics, as this will give you useful information about missing statistics:

DIAGNOSTIC HELPSTATS ON FOR SESSION;

Run the EXPLAIN statement afterward, and you will find at the end of the output a list of statistics the optimizer is suggesting.

Not all suggested statistics may make sense in your scenario, but they give you a good starting point.

Be careful: It’s not about adding all the statistics which are suggested!
You should definitely look for suggestions on columns used in JOINS and WHERE conditions (especially if not indexed).
Add them step by step, re-check the EXPLAIN output after each time you added statistics, and look for changes in the plan.
If not, drop the statistics again. Do not add them all at once as you will not be able anymore to find only the useful ones!

If the diagnostics suggest statistics on a column or a combination of columns that is unique or almost unique, go for sample statistics only (COLLECT STATISTICS USING SAMPLE ON COLUMN <(COLUMN)> as they are much faster and need fewer resources.

Check the plan for deviations between the estimated rows and the number of rows in the related table. This could be a hint that there are aged statistics on the table.

SHOW STATISTICS VALUES ON <TheTable>; is your best friend to get detailed inside on the statistics currently used by the Optimizer. It will even show you the skewed values of each column, which is valuable information when designing your query.

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>