Teradata Tactical Workload

DWH Pro Admin

June 4, 2022

minutes reading time


Introduction

In this blog post, we talk about tactical workloads on a Teradata system. While Teradata has received features that support tactical workloads over the years, it is still a challenging workload category to handle. We must ideally choose the physical design to meet user expectations for query speed.

Nothing is more frustrating than designing the Teradata tactical workload on a test environment, only to find out that it does not meet user expectations on the production system.

Over the years, I have often seen applications that execute tactical queries fail to meet user expectations because they were implemented as if they were implementing a data warehouse for a strategic workload.

Let’s take a web application where users can retrieve information about individual customers. This is a classic OLTP task and not necessarily a suitable workload for a data warehouse system. The user will usually expect to be able to click through the customer’s data in real-time. Usability falls and stands with performance.

We need to achieve stable query run times to have satisfied users. No user will find it satisfying if the same query sometimes takes 2 seconds and sometimes several minutes. But this is exactly what I have experienced often because the designers did not consider tactical workload specifics in the physical database design. Usually, applications are developed on an unloaded test system; the big surprise comes when the application is put into production and competing with the production workload.

The Cause of Performance Problems for Tactical Queries

Teradata is a parallel shared-nothing database system designed to read large amounts of data as efficiently as possible. Unfortunately, this is not what we need for a tactical workload. Here it is about accessing individual physical rows on the disk as efficiently as possible (reading and writing).

The big problem with tactical queries in Teradata is full table scans, as their runtime depends on the Teradata system’s overall load. Of course, we can theoretically improve the runtimes by increasing the size of the system (by adding new nodes). Still, this is usually not feasible, and further, Teradata can return the query result a little bit faster, but the run times are still not stable. I have seen projects where they bought a dedicated Teradata system just to run the tactical application isolated on it. This will usually not be possible for various reasons. The question arises: Why does it have to be a Teradata system? Why not a system that is specialized in tactical workload?

As shown below, the optimization goal of tactical query tuning is to minimize the number of needed IOs to access one or a few rows on a single AMP. Optimally, a tactical query execution plan localizes the work in only one AMP and requires just one data block transfer from disk to memory.

Teradata Tactical Workload

Teradata Workload Management supports tactical queries (which have the above-described characteristics) with a tactical workload group, in which AWTs are reserved and only used for single-AMP accesses. This approach guarantees that tactical queries get their turn immediately and do not have to wait for AWT availability.

However, a query is immediately moved to a lower priority workload group should the query not behave as expected to protect valuable resources (we describe the details later in this blog post)

We aim for a query that does single AMP access to ensure almost the same execution time for each execution. Therefore, achieving a stable execution plan with the proper statistics is essential.

How to design Tactical Workload in Teradata

We need to distinguish between retrieve steps and join steps.

Retrieve steps are easier to be optimized than join steps: we need a primary index (UPI, NUPI) or similar access (for example, a single table join index). Primary index access ensures that data rows can be localized within one or a few data blocks, and access time is guaranteed to be stable.

Joins are a different kind of workload as they usually involve most or all AMPs. Still, we can design the workload to use the nested join and merge joins that involve only a few AMPs – these are the only two join methods that the Optimizer can restrict to one or a few AMPs. The best option is the nested join, which can be used in an initial step as a filter to limit to exactly one row.

Another advantage of single-AMP operations is that the parsing engine uses ROWHASH-level locking for the retrieving step, and a more impacting locking step is not needed. ROWHASH-level locking allows for higher concurrency when many tactical queries simultaneously access the same table.

So, as we have seen, our main task in designing tactical workloads is to obtain an execution plan that does not involve full table scans but predominantly single-AMP or group-AMPs steps.

As we mentioned, the primary index access in each step is our most important tool. Therefore, using a database dedicated to the application that submits tactical queries is often advantageous. Because then we are free in our decision to model the indexes as we need them. Unfortunately, this is not always possible. From my experience, the same database is often used for tactical and strategic workloads, and the free choice of the primary index is not given. Then there is still the possibility to design direct access using a Join Index or secondary indexes.

A unique trick is to create a single table join index that acts like a “hashed” NUSI. I.e., we have the advantages of direct access combined with the benefit that the index does not have to be unique.

For row partitioned tables, the partition columns should always be part of the primary index, if possible, since this ensures that a row has to be searched in only one partition.

Tactical Workload Exceptions as Tuning Help

For optimal performance, Teradata must first recognize tactical workloads as such. Another hurdle is that we need to achieve that our queries are executed entirely in the tactical workload tier and are not prematurely moved to a lower priority workload tier.

Since the tactical workload tier gives full access to all resources, Teradata has implemented a few security measures to prevent a lousy query from crippling the system. These are called Tactical Workload Exceptions based on CPU usage and IO consumption. Specific limits are defined, and queries that do not (or no longer) stay below these limits are moved to another workload with lower priority.

We can define limits for IO usage and CPU seconds usage across all nodes (summed up) or per node.

The priority scheduler manages the limits per node (as an independent computer), and the summed-up limits are controlled by TASM.

Teradata defines the following limits by default for the tactical workload:

  • Maximum 2 CPU seconds per node
  • Maximum 200 megabytes per node

We must multiply these numbers by the number of nodes for the total limits.

We cannot remove these limits entirely, but they can be changed, though this has to be done carefully because the impact on the system’s overall performance can be huge.

The limits per node only affect the node which exceeds the limit. I.e., it may be that only on specific nodes the AMP worker tasks of a query are moved from the tactical tier to one with lower priority. This will happen if the query has high skew and one or a few AMPs have most of the load.

On the other hand, the overall limits are activated on all nodes. If the total limit (e.g., CPU seconds) is exceeded, all nodes will prioritize the workload lower.

Tuning with Teradata Workload Exceptions

If we want to know if a tactical workload exception downgraded our query, we can see it in the DBC.DBQLOGTBL table.
If WDID and FinalWDID are different, the reason is a tactical workload exception. In the columns TacticalCPUException and TacticalIOException, we can see, in this case, how many nodes have thrown an exception.

To design a tactical workload, we must use the query log to check if our queries start in the tactical tier and end there. Never rely on run times. Check in the table DBC.DBQLOGTBL as well the number of AMPs involved in each step.

Final Considerations

Finally, I would like to point out how important it is that, in addition to tuning measures, our tactical queries must also be supported by workload management, and we must prevent blocking. Blocking can be minimized by the correct use of object locks (LOCKING FOR ACCESS). Workload delays are to be avoided for the tactical workload. If the tactical queries are optimized, the Teradata user under which it is executed should not be exposed to delays. It is difficult for the end-user to distinguish why a result does not come back immediately – and he will not care.

  • Avatar
    Arent de Besche says:

    Nice article, thanks. The “special trick” was interesting! We use dedicated database for web-service client/purpose. A stored proc runs periodically and re-structures source data to fit the definition of table/index designed as needed. The single/few AMP-query is packaged in a macro with input parameters. This to restrict the client (allow only execute macro). A dedicated TD UserName is created for the purpose.
    Tip: You should pay attention to the workload setting, the initial priority given to the query/user, and the time allowed before given reduced priority.
    You should set requirements on the client-side environment. Let’s say it is a MS web server component using the Teradata .NET Data Provider with default settings, connection pooling is on, and it runs on a clustered 4 node web server. The default max pool size is 100 (seen on version 16). From all 4 nodes on a very busy day, that could result in 400 sessions from one tactical client. Say you support several such client services on one Teradata system, and do the math. Then ask your TD DBA how many sessions and AWTs are available at hot hours. Probably you would require the client to set a relatively short query response timeout value depending on your use-case (range around 6 to 20 sec maybe?), connection pooling is beneficial, but restrict the size depending on your situation: A max pool size of 5 for a TD Data Provider that runs on 4 nodes means you allow 20 sessions. Go tactical! Best regards Arent

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >