5 Tips for Optimizing Teradata SQL Queries: Avoiding Joins, Functions, Union, and Group By

Roland Wenzlofsky

April 20, 2023

minutes reading time

Is Teradata-Specific SQL different from SQL?

While the SQL Teradata uses is a dialect of SQL, there are some differences compared to ANSI SQL. It is specifically designed for the Teradata architecture, focusing on parallel processing and scalability. While it supports most of the standard SQL features, some additional functions, optimizations, and syntax variations are unique to Teradata. However, if you are familiar with standard SQL, you should be able to adapt to the Teradata-specific SQL with relative ease.

What kind of SQL does Teradata use?

Teradata is largely ANSI-compliant, which means it adheres to the American National Standards Institute’s SQL standards. While it includes most standard SQL features, it also has some additional functions, optimizations, and syntax variations unique to the Teradata environment. This allows for better performance and more efficient query execution in Teradata databases.

What kind of database is Teradata?

Teradata is a relational database management system (RDBMS) specifically designed for large-scale data warehousing and analytics. It is built on a parallel processing architecture, which enables it to handle massive amounts of data and complex queries efficiently. Teradata’s unique architecture allows it to distribute data and processing tasks across multiple nodes, providing high performance and scalability. This makes Teradata an ideal choice for organizations dealing with large volumes of data and requiring powerful analytical capabilities.

How to connect Teradata’s SQL Assistant

To connect to Teradata using SQL Assistant, follow these steps:
Install SQL Assistant: First, ensure you have SQL Assistant installed on your computer. If you do not have it, download the Teradata Tools and Utilities (TTU) package from the Teradata website and install the appropriate components, including SQL Assistant.

Launch SQL Assistant:

Open SQL Assistant on your computer.
Configure an ODBC Data Source: Before connecting to a Teradata system, you must set up an ODBC data source. To do this, follow these steps:
a. Open the ODBC Data Source Administrator on your computer (usually in the Control Panel or Administrative Tools).
b. Go to the ‘System DSN’ or ‘User DSN’ tab and click ‘Add’.
c. Select the ‘Teradata’ driver from the list and click ‘Finish’.
d. Enter the required information for the data source, such as the Name, Description, and Teradata server’s IP address or hostname. You may also need to provide additional connection details, such as the Authentication Mechanism and login credentials.
e. Click ‘OK’ to save the data source.

Connect to the Teradata System:

a. In SQL Assistant, go to the ‘File’ menu and click ‘Connect’ or the ‘Connect’ icon on the toolbar.
b. Choose the ODBC data source you configured in step 3 from the ‘Data Source Name’ dropdown list.
c. Enter your Teradata username and password, and click ‘OK’ to establish the connection.

Now, you should be connected to the Teradata system, and you can start running queries and managing your data using SQL Assistant.

What is Teradata SQL Assistant?

SQL Assistant is a query and administration tool designed for Teradata databases. It provides an easy-to-use interface for writing, executing, and analyzing SQL queries and managing database objects. With its features like code highlighting, result formatting, and data export, SQL Assistant simplifies the process of working with Teradata databases, making it an ideal tool for both beginners and experienced database developers or administrators.

1. Avoid multiple Joins to the same table

Reducing the number of joins to a table is crucial, as each additional join could lead to a full table scan or index access. This can be accomplished by implementing subqueries, volatile tables, or ordered analytical functions.

This example showcases applying an ordered analytical function to attain this objective.

SELECT
	A.SUBSCRIBER_ID,
 	B.VALUE AS INCOMING,
 	C.VALUE AS OUTGOING
FROM 
	SUBSCRIBERS A
	LEFT JOIN  CALLS B ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID AND B.INOUT_CD  = 'IN'
	LEFT JOIN  CALLS C ON A.SUBSCRIBER_ID = C.SUBSCRIBER_ID AND C.INOUT_CD  = 'OUT'
WHERE
	B.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
	C.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'

SELECT
	A.SUBSCRIBER_ID,
 	B.INCOMING,
 	B.OUTGOING
FROM 
	SUBSCRIBERS A
	LEFT JOIN  
	(
		SELECT 
			SUBSCRIBER_ID,
			MAX(CASE WHEN INOUT_CD  = 'IN' THEN VALUE END) AS AVN_OFF_BAL,
			MAX(CASE WHEN INOUT_CD  = 'OUT' THEN VALUE END) AS BAL_OFF_BAL
		FROM
			CALLS 
		WHERE
			CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
			AND INOUT_CD IN ('IN', 'OUT')
		GROUP BY 1
	) B
ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID;

2. Avoid Functions in Equi-Join Conditions

When applied to join columns, Teradata SQL functions do not typically allow direct ROWHASH access. Functions are commonly utilized for joining when the data model is insufficiently normalized. Therefore, it is crucial to prioritize improving the data model before considering using functions.

a.SUBSCRIBER_ID = TRIM(LEADING '0' FROM b.SUBSCRIBER_ID)

3. Avoid the Usage of UNION

Teradata performs deduplication in the spool space as UNION disallows row duplicates. To accomplish deduplication, all branches are duplicated into a shared spool space and sorted, allowing duplicates to be removed. This method is significantly more intricate than utilizing a UNION ALL statement. Furthermore, it’s noteworthy that UNION doesn’t support any of the newer tuning features, like pushing joins into individual branches. If you wish to learn more about this topic, you’re in the right place.

Teradata Set Operators: Understanding UNION vs. UNION ALL for Peak Performance Optimization

Enhanced Performance Features for UNION ALL

Use UNION ALL instead of UNION when duplicate values are acceptable (as demonstrated below):

Instead of: 

SELECT   CAST ('FIRST'  AS VARCHAR (15)), PK FROM TheTable
UNION 
SELECT   CAST ('SECOND'  AS VARCHAR (15)), PK FROM TheTable;

Use:

SELECT   CAST ('FIRST'  AS VARCHAR (15)), PK FROM TheTable
UNION ALL
SELECT   CAST ('SECOND'  AS VARCHAR (15)), PK FROM TheTable;

4. Avoid GROUP BY over Expressions

Aggregating expressions and Teradata SQL functions are not advisable, as rows must be distributed based on the ROWHASH to perform aggregation, similar to joins. The example below illustrates improved performance by directly aggregating the underlying columns instead of concatenating:

SELECT  COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE 
GROUP BY 
COALESCE (SUBSCRIBER_ID,0) || COALESCE(DESCRIPTION ,'');

Change to:

SELECT  COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE 
GROUP BY 
COALESCE (SUBSCRIBER_ID,0) , COALESCE(DESCRIPTION ,'');

5. Divide large SQL Queries into smaller ones

Consider modularizing your program logic to mitigate the risks of extensive SQL statements. As statements increase, they may overextend spool space limits and generate non-optimal execution plans due to the exponential growth of possible joins. Effective points for breaking up SQL statements include SET operators (e.g. UNION, UNION ALL), subqueries, and joins. Evaluate which subqueries can be incorporated into volatile tables, which offer the benefit of collecting statistics and preprocessing data, such as resolving functions to reduce the need for them in join columns.

6. Consider the Creation of additional Statistics and Indexes

Before delving into further optimizations, it is crucial to ascertain that the statistics are readily available and current. The optimizer should fashion an optimal plan founded on the statistics before delving into other optimization approaches.

Note that dynamic AMP sampling can often generate a sound execution plan without the need for collected statistics.

To swiftly obtain an overview of your statistics, execute the command DIAGNOSTIC HELPSTATS ON FOR SESSION before explaining your Teradata SQL. Every EXPLAIN statement executed during the session will list recommended statistics and their corresponding confidence level. While it may not be necessary to implement every suggestion, they can aid in identifying noteworthy inaccuracies in your statistics.

It is recommended to gradually include statistics and observe the alterations in the execution plan.

Consider using sample statistics instead of optimizer-suggested statistics for highly distinct columns.

COLLECT STATISTICS USING SAMPLE ON COLUMN aPrettyDistinctColumn;

Collecting basic statistics is less resource-intensive and often sufficient in such instances.

To verify outdated statistics, execute a SELECT * on the table and compare the outcome with the projected estimate in the Execution Plan.

To gain a comprehensive understanding of a table’s statistics, use the following command:

SHOW STATISTICS VALUES ON aTable;

If you have arrived here, you may also want to read our article on Performance Tuning:

The Importance of Up-to-Date Statistics for Teradata SQL Tuning

Find more details about Teradata and SQL here:

https://support.teradata.com

Take a brief survey and receive the book for free Get instant access

Share0

Tweet0

Share0

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

5 Tips for Optimizing Teradata SQL Queries: Avoiding Joins, Functions, Union, and Group By

Is Teradata-Specific SQL different from SQL?

What kind of SQL does Teradata use?

What kind of database is Teradata?

How to connect Teradata’s SQL Assistant

What is Teradata SQL Assistant?

1. Avoid multiple Joins to the same table

2. Avoid Functions in Equi-Join Conditions

3. Avoid the Usage of UNION

4. Avoid GROUP BY over Expressions

5. Divide large SQL Queries into smaller ones

6. Consider the Creation of additional Statistics and Indexes

Roland Wenzlofsky

Fast multi-file export of Teradata query results using only Teradata SQL Assistant

The Teradata AMP Worker Task

Boost Your Teradata Performance – The Critical Role of NOT NULL Declarations

Optimizing Teradata SQL Queries by Avoiding Full Table Scans and Utilizing Secondary Indexes