Is Teradata-Specific SQL different from SQL?
While the SQL Teradata uses is a dialect of SQL, there are some differences compared to ANSI SQL. It is specifically designed for the Teradata architecture, focusing on parallel processing and scalability. While it supports most of the standard SQL features, some additional functions, optimizations, and syntax variations are unique to Teradata. However, if you are familiar with standard SQL, you should be able to adapt to the Teradata-specific SQL with relative ease.
What kind of SQL does Teradata use?
Teradata is largely ANSI-compliant, which means it adheres to the American National Standards Institute’s SQL standards. While it includes most standard SQL features, it also has some additional functions, optimizations, and syntax variations unique to the Teradata environment. This allows for better performance and more efficient query execution in Teradata databases.
What kind of database is Teradata?
Teradata is a relational database management system (RDBMS) specifically designed for large-scale data warehousing and analytics. It is built on a parallel processing architecture, which enables it to handle massive amounts of data and complex queries efficiently. Teradata’s unique architecture allows it to distribute data and processing tasks across multiple nodes, providing high performance and scalability. This makes Teradata an ideal choice for organizations dealing with large volumes of data and requiring powerful analytical capabilities.
How to connect Teradata’s SQL Assistant
To connect to Teradata using SQL Assistant, follow these steps:
Install SQL Assistant: First, ensure you have SQL Assistant installed on your computer. If you do not have it, download the Teradata Tools and Utilities (TTU) package from the Teradata website and install the appropriate components, including SQL Assistant.
Launch SQL Assistant:
Open SQL Assistant on your computer.
Configure an ODBC Data Source: Before connecting to a Teradata system, you must set up an ODBC data source. To do this, follow these steps:
a. Open the ODBC Data Source Administrator on your computer (usually in the Control Panel or Administrative Tools).
b. Go to the ‘System DSN’ or ‘User DSN’ tab and click ‘Add’.
c. Select the ‘Teradata’ driver from the list and click ‘Finish’.
d. Enter the required information for the data source, such as the Name, Description, and Teradata server’s IP address or hostname. You may also need to provide additional connection details, such as the Authentication Mechanism and login credentials.
e. Click ‘OK’ to save the data source.
Connect to the Teradata System:
a. In SQL Assistant, go to the ‘File’ menu and click ‘Connect’ or the ‘Connect’ icon on the toolbar.
b. Choose the ODBC data source you configured in step 3 from the ‘Data Source Name’ dropdown list.
c. Enter your Teradata username and password, and click ‘OK’ to establish the connection.
Now, you should be connected to the Teradata system, and you can start running queries and managing your data using SQL Assistant.
What is Teradata SQL Assistant?
SQL Assistant is a query and administration tool designed for Teradata databases. It provides an easy-to-use interface for writing, executing, and analyzing SQL queries and managing database objects. With its features like code highlighting, result formatting, and data export, SQL Assistant simplifies the process of working with Teradata databases, making it an ideal tool for both beginners and experienced database developers or administrators.
1. Avoid multiple Joins to the same table
Reducing the number of joins to a table is crucial, as each additional join could lead to a full table scan or index access. This can be accomplished by implementing subqueries, volatile tables, or ordered analytical functions.
This example showcases applying an ordered analytical function to attain this objective.
SELECT
A.SUBSCRIBER_ID,
B.VALUE AS INCOMING,
C.VALUE AS OUTGOING
FROM
SUBSCRIBERS A
LEFT JOIN CALLS B ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID AND B.INOUT_CD = 'IN'
LEFT JOIN CALLS C ON A.SUBSCRIBER_ID = C.SUBSCRIBER_ID AND C.INOUT_CD = 'OUT'
WHERE
B.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
C.CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
SELECT
A.SUBSCRIBER_ID,
B.INCOMING,
B.OUTGOING
FROM
SUBSCRIBERS A
LEFT JOIN
(
SELECT
SUBSCRIBER_ID,
MAX(CASE WHEN INOUT_CD = 'IN' THEN VALUE END) AS AVN_OFF_BAL,
MAX(CASE WHEN INOUT_CD = 'OUT' THEN VALUE END) AS BAL_OFF_BAL
FROM
CALLS
WHERE
CALL_DATE BETWEEN DATE '2019-01-01' AND DATE '2019-12-31'
AND INOUT_CD IN ('IN', 'OUT')
GROUP BY 1
) B
ON A.SUBSCRIBER_ID = B.SUBSCRIBER_ID;
2. Avoid Functions in Equi-Join Conditions
When applied to join columns, Teradata SQL functions do not typically allow direct ROWHASH access. Functions are commonly utilized for joining when the data model is insufficiently normalized. Therefore, it is crucial to prioritize improving the data model before considering using functions.
a.SUBSCRIBER_ID = TRIM(LEADING '0' FROM b.SUBSCRIBER_ID)
3. Avoid the Usage of UNION
Teradata performs deduplication in the spool space as UNION disallows row duplicates. To accomplish deduplication, all branches are duplicated into a shared spool space and sorted, allowing duplicates to be removed. This method is significantly more intricate than utilizing a UNION ALL statement. Furthermore, it’s noteworthy that UNION doesn’t support any of the newer tuning features, like pushing joins into individual branches. If you wish to learn more about this topic, you’re in the right place.
Use UNION ALL instead of UNION when duplicate values are acceptable (as demonstrated below):
Instead of:
SELECT CAST ('FIRST' AS VARCHAR (15)), PK FROM TheTable
UNION
SELECT CAST ('SECOND' AS VARCHAR (15)), PK FROM TheTable;
Use:
SELECT CAST ('FIRST' AS VARCHAR (15)), PK FROM TheTable
UNION ALL
SELECT CAST ('SECOND' AS VARCHAR (15)), PK FROM TheTable;
4. Avoid GROUP BY over Expressions
Aggregating expressions and Teradata SQL functions are not advisable, as rows must be distributed based on the ROWHASH to perform aggregation, similar to joins. The example below illustrates improved performance by directly aggregating the underlying columns instead of concatenating:
SELECT COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE
GROUP BY
COALESCE (SUBSCRIBER_ID,0) || COALESCE(DESCRIPTION ,'');
Change to:
SELECT COALESCE (SUBSCRIBER_ID ,0) || COALESCE(DESCRIPTION ,'') FROM THETABLE
GROUP BY
COALESCE (SUBSCRIBER_ID,0) , COALESCE(DESCRIPTION ,'');
5. Divide large SQL Queries into smaller ones
Consider modularizing your program logic to mitigate the risks of extensive SQL statements. As statements increase, they may overextend spool space limits and generate non-optimal execution plans due to the exponential growth of possible joins. Effective points for breaking up SQL statements include SET operators (e.g. UNION, UNION ALL), subqueries, and joins. Evaluate which subqueries can be incorporated into volatile tables, which offer the benefit of collecting statistics and preprocessing data, such as resolving functions to reduce the need for them in join columns.
6. Consider the Creation of additional Statistics and Indexes
Before delving into further optimizations, it is crucial to ascertain that the statistics are readily available and current. The optimizer should fashion an optimal plan founded on the statistics before delving into other optimization approaches.
Note that dynamic AMP sampling can often generate a sound execution plan without the need for collected statistics.
To swiftly obtain an overview of your statistics, execute the command DIAGNOSTIC HELPSTATS ON FOR SESSION before explaining your Teradata SQL. Every EXPLAIN statement executed during the session will list recommended statistics and their corresponding confidence level. While it may not be necessary to implement every suggestion, they can aid in identifying noteworthy inaccuracies in your statistics.
It is recommended to gradually include statistics and observe the alterations in the execution plan.
Consider using sample statistics instead of optimizer-suggested statistics for highly distinct columns.
COLLECT STATISTICS USING SAMPLE ON COLUMN aPrettyDistinctColumn;
Collecting basic statistics is less resource-intensive and often sufficient in such instances.
To verify outdated statistics, execute a SELECT * on the table and compare the outcome with the projected estimate in the Execution Plan.
To gain a comprehensive understanding of a table’s statistics, use the following command:
SHOW STATISTICS VALUES ON aTable;
If you have arrived here, you may also want to read our article on Performance Tuning:
Find more details about Teradata and SQL here: