5 Tips for Optimizing Teradata SQL Queries: Avoiding Joins, Functions, Union, and Group By

Is Teradata-Specific SQL different from SQL?

While the SQL Teradata uses is a dialect of SQL, there are some differences compared to ANSI SQL. It is specifically designed for the Teradata architecture, focusing on parallel processing and scalability. While it supports most of the standard SQL features, some additional functions, optimizations, and syntax variations are unique to Teradata. However, if you are familiar with standard SQL, you should be able to adapt to the Teradata-specific SQL with relative ease.

What kind of SQL does Teradata use?

Teradata is largely ANSI-compliant, which means it adheres to the American National Standards Institute’s SQL standards. While it includes most standard SQL features, it also has some additional functions, optimizations, and syntax variations unique to the Teradata environment. This allows for better performance and more efficient query execution in Teradata databases.

What kind of database is Teradata?

Teradata is a relational database management system (RDBMS) specifically designed for large-scale data warehousing and analytics. It is built on a parallel processing architecture, which enables it to handle massive amounts of data and complex queries efficiently. Teradata’s unique architecture allows it to distribute data and processing tasks across multiple nodes, providing high performance and scalability. This makes Teradata an ideal choice for organizations dealing with large volumes of data and requiring powerful analytical capabilities.

How to connect Teradata’s SQL Assistant

To connect to Teradata using SQL Assistant, follow these steps:
Install SQL Assistant: First, ensure you have SQL Assistant installed on your computer. If you do not have it, download the Teradata Tools and Utilities (TTU) package from the Teradata website and install the appropriate components, including SQL Assistant.

Launch SQL Assistant:

Open SQL Assistant on your computer.
Configure an ODBC Data Source: Before connecting to a Teradata system, you must set up an ODBC data source. To do this, follow these steps:
a. Open the ODBC Data Source Administrator on your computer (usually in the Control Panel or Administrative Tools).
b. Go to the ‘System DSN’ or ‘User DSN’ tab and click ‘Add’.
c. Select the ‘Teradata’ driver from the list and click ‘Finish’.
d. Enter the required information for the data source, such as the Name, Description, and Teradata server’s IP address or hostname. You may also need to provide additional connection details, such as the Authentication Mechanism and login credentials.
e. Click ‘OK’ to save the data source.

Connect to the Teradata System:

a. In SQL Assistant, go to the ‘File’ menu and click ‘Connect’ or the ‘Connect’ icon on the toolbar.
b. Choose the ODBC data source you configured in step 3 from the ‘Data Source Name’ dropdown list.
c. Enter your Teradata username and password, and click ‘OK’ to establish the connection.

Now, you should be connected to the Teradata system, and you can start running queries and managing your data using SQL Assistant.

What is Teradata SQL Assistant?

SQL Assistant is a query and administration tool designed for Teradata databases. It provides an easy-to-use interface for writing, executing, and analyzing SQL queries and managing database objects. With its features like code highlighting, result formatting, and data export, SQL Assistant simplifies the process of working with Teradata databases, making it an ideal tool for both beginners and experienced database developers or administrators.

1. Avoid multiple Joins to the same table

Reducing the number of joins to a table is crucial, as each additional join could lead to a full table scan or index access. This can be accomplished by implementing subqueries, volatile tables, or ordered analytical functions.

This example showcases how to apply an ordered analytical function to achieve this objective.

SELECT
	A.SUBSCRIBER_ID,
 	B.VALUE AS INCOMING,
 	C.VALUE AS OUTGOING
FROM 
	SUBSCRIBERS A
	LEFT JOIN  CALLS B ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID AND B.INOUT_CD  = 'IN'
	LEFT JOIN  CALLS C ON A.SUBSCRIBER_ID = C.SUBSCRIBER_ID AND C.INOUT_CD  = 'OUT'
WHERE
	B.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
	C.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
SELECT
	A.SUBSCRIBER_ID,
 	B.INCOMING,
 	B.OUTGOING
FROM 
	SUBSCRIBERS A
	LEFT JOIN  
	(
		SELECT 
			SUBSCRIBER_ID,
			MAX(CASE WHEN INOUT_CD  = 'IN' THEN VALUE END) AS AVN_OFF_BAL,
			MAX(CASE WHEN INOUT_CD  = 'OUT' THEN VALUE END) AS BAL_OFF_BAL
		FROM
			CALLS 
		WHERE
			CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
			AND INOUT_CD IN ('IN', 'OUT')
		GROUP BY 1
	) B
ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID;

2. Avoid Functions in Equi-Join Conditions

When applied to join columns, Teradata SQL functions do not typically allow direct ROWHASH access. Functions are commonly utilized for joining when the data model is insufficiently normalized. Therefore, it is crucial to prioritize improving the data model before considering using functions.

a.SUBSCRIBER_ID = TRIM(LEADING '0' FROM b.SUBSCRIBER_ID)

3. Avoid the Usage of UNION

Teradata performs deduplication in the spool space as UNION disallows row duplicates. To accomplish deduplication, all branches are duplicated into a shared spool space and sorted, allowing duplicates to be removed. This method is significantly more intricate than utilizing a UNION ALL statement. Furthermore, it’s noteworthy that UNION doesn’t support any of the newer tuning features, like pushing joins into individual branches. If you wish to learn more about this topic, you’re in the right place.

Enhanced Performance Features for UNION ALL

Use UNION ALL instead of UNION when duplicate values are acceptable (as demonstrated below):

Instead of: 

SELECT   CAST ('FIRST'  AS VARCHAR (15)), PK FROM TheTable
UNION 
SELECT   CAST ('SECOND'  AS VARCHAR (15)), PK FROM TheTable;

Use:

SELECT   CAST ('FIRST'  AS VARCHAR (15)), PK FROM TheTable
UNION ALL
SELECT   CAST ('SECOND'  AS VARCHAR (15)), PK FROM TheTable;

4. Avoid GROUP BY over Expressions

Aggregating expressions and Teradata SQL functions are not advisable, as rows must be distributed based on the ROWHASH to perform aggregation, similar to joins. The example below illustrates improved performance by directly aggregating the underlying columns instead of concatenating:

SELECT  COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE 
GROUP BY 
COALESCE (SUBSCRIBER_ID,0) || COALESCE(DESCRIPTION ,'');

Change to:

SELECT  COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE 
GROUP BY 
COALESCE (SUBSCRIBER_ID,0) , COALESCE(DESCRIPTION ,'');

5. Divide large SQL Queries into smaller ones

Consider modularizing your program logic to mitigate the risks of extensive SQL statements. As statements increase, they may overextend spool space limits and generate non-optimal execution plans due to the exponential growth of possible joins. Effective points for breaking up SQL statements include SET operators (e.g. UNION, UNION ALL), subqueries, and joins. Evaluate which subqueries can be incorporated into volatile tables, which offer the benefit of collecting statistics and preprocessing data, such as resolving functions to reduce the need for them in join columns.

6. Consider the Creation of additional Statistics and Indexes

Before delving into further optimizations, it is crucial to ascertain that the statistics are readily available and current. The optimizer should build an optimal plan based on the statistics before moving on to other optimization approaches.

Note that dynamic AMP sampling can often generate a sound execution plan without the need for collected statistics.

To swiftly obtain an overview of your statistics, execute the command DIAGNOSTIC HELPSTATS ON FOR SESSION before explaining your Teradata SQL. Every EXPLAIN statement executed during the session will list recommended statistics and their corresponding confidence level. While it may not be necessary to implement every suggestion, they can aid in identifying noteworthy inaccuracies in your statistics.

It is recommended to gradually include statistics and observe the alterations in the execution plan.

Consider using sample statistics instead of optimizer-suggested statistics for highly distinct columns.

COLLECT STATISTICS USING SAMPLE ON COLUMN aPrettyDistinctColumn;

Collecting basic statistics is less resource-intensive and often sufficient in such instances.

To verify outdated statistics, execute a SELECT * on the table and compare the outcome with the projected estimate in the Execution Plan.

To gain a comprehensive understanding of a table’s statistics, use the following command:

SHOW STATISTICS VALUES ON aTable;

If you have arrived here, you may also want to read our article on Performance Tuning:

Find more details about Teradata and SQL here:

https://support.teradata.com

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

📊 Data Platform Migration Survey

Help us map where the industry is heading. Results are public — see what others chose.

1. What is your current data platform?

2. Where are you migrating to (or evaluating)?

Migrating FROM
Migrating TO

Thanks for voting! Share this with your network.

Follow me on LinkedIn for daily insights on data warehousing and platform migrations.

Stay Ahead in Data Warehousing

Get expert insights on Teradata, Snowflake, BigQuery, Databricks, Microsoft Fabric, and modern data architecture — delivered to your inbox.

Leave a Comment

DWHPro

Expert network for enterprise data platforms. Senior consultants, project teams built for your challenge — across Teradata, Snowflake, Databricks, and more.

📍Vienna, Austria & Jacksonville, Florida

Quick Links
Services Team Teradata Book Blog Contact Us
Connect
LinkedIn → [email protected]
Newsletter

Join 4,000+ data professionals.
Weekly insights on Teradata, Snowflake & data architecture.