Teradata DBQL Performance Tuning
What Are The Most Important Performance Indicators?
In this article, I will show you by which key figures you can use to recognize if a query should be optimized.
The selected Teradata performance indicators illuminate the query from different angles and give a good overview of whether a query should be improved and what the problem is.
All columns listed in the calculations are available in the table DBC.DBQLOGTBL.
The Product Join Indicator
As the name suggests, this indicator is used to determine if there are product joins in a query.
It is not about desired product joins (e.g. cross join with a calendar), as they sometimes occur, but about people who consume large amounts of resources.
Unwanted product joins can be recognized by the fact that the ratio of CPU to IO consumption is high.
Values in the three-digit range are often reached here. The reason is the high number of comparisons of rows executed by a product join.
This is how the Product Join Indicator is calculated:
CASE WHEN TotalIOCount = 0 THEN 0 ELSE (AMPCPUTime * 1000)/TotalIOCount END
If the Product Join Indicator is 3 or higher it is worth taking a look at the execution plan and looking for a product join.
However, a high Product Join Indicator can also have other causes:
- Aggregations of large amounts of data that pass through the memory. Since the aggregation takes place in memory, hardly any IOs are required, but a lot of CPU seconds.
- Duplicate rows check when there are many hash collisions
- Extensive string operations as they are often generated in the queries of BI tools.
- The use of Multivalue Compression also tends to increase the Product Join Indicator. Since more rows can be packed into a data block when using MVC, fewer IOs are needed to copy the data into memory. This also applies to systems that do not have dedicated hardware for Block Level Compression. BLC needs CPU for decompressing.
The Indicator For Unnecessary IOs
Here it is exactly the opposite of the product join indicator.
Queries or steps are identified that take many IOs compared to the CPU seconds.
A high value for this indicator usually means that large amounts of data are read or copied.
This is often a sign that either the primary indexes are not used optimally or secondary indexes are missing.
Here is how this Indicator is calculated:
CASE WHEN AMPCPUTime = 0 THEN 0 ELSE TotalIOCount/(AMPCPUTime * 1000) END
As with the product join indicator, for values above 3, you should take a closer look at the execution plan.
The CPU and IO Skew Indicators
This is one of the most important indicators as it identifies skewed workload.
As you know, skew is one of the main causes of poor performance on a Teradata system.
With this indicator, we can search for skewed tables, joins that are skewed, etc.
The calculation for CPU skew is done like this:
CASE WHEN (AMPCPUTime / (HASHAMP()+1) =0 THEN 0 ELSE MaxAmpCPUTime/(AMPCPUTime / (hashamp()+1) END
The calculation for IO skew is done like this:
CASE WHEN (TotalIOCount / (HASHAMP()+1) =0 THEN 0 ELSE MaxAmpIO/(TotalIOCount / (HASHAMP()+1) END
Values above 2 should be the trigger to take a closer look at the execution plan.
The CPU Impact Indicator
This indicator goes one step further and calculates the effect of skewing on the exploitation of the parallelism of the system.
The indicator is calculated in this way:
MaxAmpCPUTime * (HASHAMP()+1)
Here the basic rule applies: the higher the value compared to the other queries the worse.
High CPU or IO Consumption in General
These are the most important key figures in Performance Tuning.
The great thing about it is:
A query that is executed on the same data always uses the same resources per execution. These metrics are therefore much more appropriate than the runtime of queries which depends on the rest of the workload.
The Query Response Time
As already mentioned, the response time of a query is not a very good indicator for tuning.
Nevertheless, the goal is, of course, to have the fastest possible queries at the end. Even the business user will not care how many CPU seconds a query takes. The result must be available at 8 o'clock in the morning.
At the end that is all that counts. Therefore we should not completely disregard the average runtime.
Armed with this set of metrics, you can now search for queries that consume the most resources on your Teradata system. Have fun!
More about the most important topics in Teradata Performance Tuning can be found here: