Teradata SQL Queries: 6 Effective Tricks to optimize them
1. Avoid multiple Joins to the same Table
Each join means either a full table scan or index access. In all your Teradata SQL queries, avoid multiple joins to the same table if possible. Often multiple joins can be bypassed by subqueries, the use of volatile tables, or ordered analytic functions.
Here is an example of how to use a pivot in a subquery to prevent multiple access to the same table:
SELECT A.SUBSCRIBER_ID, B.VALUE AS INCOMING, C.VALUE AS OUTGOING FROM SUBSCRIBERS A LEFT JOIN CALLS B ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID AND B.INOUT_CD = 'IN' LEFT JOIN CALLS C ON A.SUBSCRIBER_ID = C.SUBSCRIBER_ID AND C.INOUT_CD = 'OUT' WHERE B.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31' C.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
SELECT A.SUBSCRIBER_ID, B.INCOMING, B.OUTGOING FROM SUBSCRIBERS A LEFT JOIN ( SELECT SUBSCRIBER_ID, MAX(CASE WHEN INOUT_CD = 'IN' THEN VALUE END) AS AVN_OFF_BAL, MAX(CASE WHEN INOUT_CD = 'OUT' THEN VALUE END) AS BAL_OFF_BAL FROM CALLS WHERE CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31' AND INOUT_CD IN ('IN', 'OUT') GROUP BY 1 ) B ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID;
2. Avoid functions in Equi-Join Conditions
Most functions prevent ROWHASH access from being used when applied to a join column. These types of transformations are a sign of poor data modeling or poor data quality. If you solve the cause, then there is no need to use functions in joins.
a.SUBSCRIBER_ID = TRIM(LEADING '0' FROM b.SUBSCRIBER_ID)
3. Avoid the usage of UNION
A simple UNION causes a sorting and deduplication of the rows. This is a very expensive process. If it is ensured that the result sets cannot provide overlapping results, UNION ALL should always be used.
Instead of: SELECT CAST ('FIRST' AS VARCHAR (15)), PK FROM TheTable UNION SELECT CAST ('SECOND' AS VARCHAR (15)), PK FROM TheTable; Use: SELECT CAST ('FIRST' AS VARCHAR (15)), PK FROM TheTable UNION ALL SELECT CAST ('SECOND' AS VARCHAR (15)), PK FROM TheTable;
4. Avoid GROUP BY over Expressions
Avoid grouping over expressions. For performance reasons, better use the underlying columns for the expression as grouping criteria, e.g., if you need to group by concatenated columns, use these columns in GROUP BY and not concatenated columns.
SELECT COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE GROUP BY COALESCE (SUBSCRIBER_ID,0) || COALESCE(DESCRIPTION ,''); Change to: SELECT COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE GROUP BY COALESCE (SUBSCRIBER_ID,0) , COALESCE(DESCRIPTION ,'');
5. Divide large Teradata SQL queries into smaller ones
A large single query can hit resource limits, especially SpoolSpace. To avoid this try to split the query into smaller chunks. Good splitting points are UNION (ALL) statements. These can be replaced by multiple insert statements. Another splitting point could be JOINs.
Consider materializing complex sub-queries in a volatile table first.
6. Consider the creation of additional statistics and indexes
But: Proper statistics does not mean we have to collect statistics at all.
We can let the optimizer do this as the optimizer always samples statistics on indexes if they are not explicitly defined.
Still, we have to be sure that the optimizer has for each table and column the correct picture about data demographics.
The very first you should do is turning on diagnostics, as this will give you useful information about missing statistics:
DIAGNOSTIC HELPSTATS ON FOR SESSION;
Run afterward the EXPLAIN statement and you will find at the end of the output a list of statistics the optimizer is suggesting.
Not all suggested statistics may make sense in your scenario but they give you a good starting point.
Be careful: It’s not about adding all the statistics which are suggested!
What you should definitely look for is suggestions on columns that are used in JOINS and WHERE conditions (especially if not indexed).
Add them step by step, re-check the EXPLAIN output after each time you added statistics and look for changes in the plan.
If not, drop the statistics again. Do not add them all at once as you will not be able anymore to find out only the useful ones!
If the diagnostics suggest statistics on a column or a combination of columns that is unique or almost unique, go for sample statistics only (COLLECT STATISTICS USING SAMPLE ON COLUMN <(COLUMN)> as they are much faster and need fewer resources.
Check the plan for deviations between the estimated rows and the number of rows in the related table. This could be a hint that there are aged statistics on the table.
SHOW STATISTICS VALUES ON <TheTable>; is your best friend to get detailed inside on the statistics currently used by the Optimizer. It will even show you the skewed values of each column which is valuable information when designing your query.