Teradata SQL Queries: 6 Effective Tricks to optimize them

1. Avoid multiple Joins to the same Table

Each join means either a full table scan or index access. In all your Teradata SQL queries, avoid multiple joins to the same table if possible. Often multiple joins can be bypassed by subqueries, the use of volatile tables, or ordered analytic functions.

Here is an example of how to use a pivot in a subquery to prevent multiple access to the same table:

SELECT
	A.SUBSCRIBER_ID,
 	B.VALUE AS INCOMING,
 	C.VALUE AS OUTGOING
FROM 
	SUBSCRIBERS A
	LEFT JOIN  CALLS B ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID AND B.INOUT_CD  = 'IN'
	LEFT JOIN  CALLS C ON A.SUBSCRIBER_ID = C.SUBSCRIBER_ID AND C.INOUT_CD  = 'OUT'
WHERE
	B.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
	C.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
SELECT
	A.SUBSCRIBER_ID,
 	B.INCOMING,
 	B.OUTGOING
FROM 
	SUBSCRIBERS A
	LEFT JOIN  
	(
		SELECT 
			SUBSCRIBER_ID,
			MAX(CASE WHEN INOUT_CD  = 'IN' THEN VALUE END) AS AVN_OFF_BAL,
			MAX(CASE WHEN INOUT_CD  = 'OUT' THEN VALUE END) AS BAL_OFF_BAL
		FROM
			CALLS 
		WHERE
			CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
			AND INOUT_CD IN ('IN', 'OUT')
		GROUP BY 1
	) B
ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID;

2. Avoid functions in Equi-Join Conditions

Most functions prevent ROWHASH access from being used when applied to a join column. These types of transformations are a sign of poor data modeling or poor data quality. If you solve the cause, then there is no need to use functions in joins.

a.SUBSCRIBER_ID = TRIM(LEADING '0' FROM b.SUBSCRIBER_ID)

3. Avoid the usage of UNION

A simple UNION causes a sorting and deduplication of the rows. This is a very expensive process. If it is ensured that the result sets cannot provide overlapping results, UNION ALL should always be used.

Instead of: 

SELECT   CAST ('FIRST'  AS VARCHAR (15)), PK FROM TheTable
UNION 
SELECT   CAST ('SECOND'  AS VARCHAR (15)), PK FROM TheTable;

Use:

SELECT   CAST ('FIRST'  AS VARCHAR (15)), PK FROM TheTable
UNION ALL
SELECT   CAST ('SECOND'  AS VARCHAR (15)), PK FROM TheTable;

4. Avoid GROUP BY over Expressions

Avoid grouping over expressions. For performance reasons, better use the underlying columns for the expression as grouping criteria, e.g., if you need to group by concatenated columns, use these columns in GROUP BY and not concatenated columns.

SELECT  COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE 
GROUP BY 
COALESCE (SUBSCRIBER_ID,0) || COALESCE(DESCRIPTION ,'');

Change to:

SELECT  COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE 
GROUP BY 
COALESCE (SUBSCRIBER_ID,0) , COALESCE(DESCRIPTION ,'');

5. Divide large Teradata SQL queries into smaller ones

A large single query can hit resource limits, especially SpoolSpace. To avoid this try to split the query into smaller chunks. Good splitting points are UNION (ALL) statements. These can be replaced by multiple insert statements. Another splitting point could be JOINs.
Consider materializing complex sub-queries in a volatile table first.

6. Consider the creation of additional statistics and indexes

It’s all about statistics.
Optimization of SQL statements makes absolutely no sense if we do not have proper statistics defined.

But: Proper statistics does not mean we have to collect statistics at all.

We can let the optimizer do this as the optimizer always samples statistics on indexes if they are not explicitly defined.
Still, we have to be sure that the optimizer has for each table and column the correct picture about data demographics.

The very first you should do is turning on diagnostics, as this will give you useful information about missing statistics:

DIAGNOSTIC HELPSTATS ON FOR SESSION;

Run afterward the EXPLAIN statement and you will find at the end of the output a list of statistics the optimizer is suggesting.

Not all suggested statistics may make sense in your scenario but they give you a good starting point.

Be careful: It’s not about adding all the statistics which are suggested!
What you should definitely look for is suggestions on columns that are used in JOINS and WHERE conditions (especially if not indexed).
Add them step by step, re-check the EXPLAIN output after each time you added statistics and look for changes in the plan.
If not, drop the statistics again. Do not add them all at once as you will not be able anymore to find out only the useful ones!

If the diagnostics suggest statistics on a column or a combination of columns that is unique or almost unique, go for sample statistics only (COLLECT STATISTICS USING SAMPLE ON COLUMN <(COLUMN)> as they are much faster and need fewer resources.

Check the plan for deviations between the estimated rows and the number of rows in the related table. This could be a hint that there are aged statistics on the table.

SHOW STATISTICS VALUES ON <TheTable>; is your best friend to get detailed inside on the statistics currently used by the Optimizer. It will even show you the skewed values of each column which is valuable information when designing your query.

Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

>