UPDATED Tactical workload tuning is a particular discipline on Teradata. Tactical workload requires a unique skill set and tools.
In this article, I will show you the skills needed to optimize tactical workload.
You will learn how to use the proper tools and the “tactical workload skill set.”
Decision support workload typically involves full table scans (FTS) and all-row joins -usually between big tables.
Indexing (NUSI, USI, etc.) can improve performance, but more often all table rows are accessed.
Here is an example of a decision support query:
SELECT CustomerGroupName, SUM(Revenue)
t01.GroupCd = t02.GroupCd
GROUP BY 1;
The tactical workload is characterized by single row access or the access to a limited number of rows. Here is an example for a tactical query:
WHERE SalesId = 100;
Optimized tactical queries need indexed access paths, such as UPI, NUPI, NUSI, USI, Join Indexes or a single row partition. The primary goal is to have an execution plan with each step only delivering one or a few rows.
The tactical selection criteria of your query have to be applied as early as possible in the execution plan.
While this allows avoiding deadlocks entirely, serialization might not be useable in any case for performance reasons.
The solution I used was to rewrite the most executed statements, increasing the number of changes to the database. By keeping the number of changes small, I was able to prevent deadlocks.
With a proper query design, tactical query execution is stable, and performance is not decreasing if table cardinality is growing, while decision support queries usually become slower with a growing amount of data:
Tactical query performance does not improve with the number of Teradata nodes. Adding more nodes to your Teradata system will not have any impact on the performance of your single tactical query. Nevertheless, adding nodes to your system allows more tactical queries to be executed at the same time.
The performance of decision support queries improves with the number of nodes. Assuming a perfect data distribution (no skewing), doubling the number of nodes can speed up your query by a factor of two (nevertheless, this are pure numbers, no system is working 100% in parallel and without any skewing effects).
Besides optimizing the retrieve steps with indexed access to single rows (USI, UPI, NUSI, Join Index, etc.), we have to optimize the joins. Join optimization means to minimize the number of AMPs involved in the join operations.
For example, the execution plan of below query shows that the selected row is accessed from only one AMP (single-AMP join step):
Above query will always perform identical (given the execution plan is not changing), and this is what we want to achieve in tactical workload tuning: A stable direct access to the data.
While above example was built around a merge join, the most suitable join type for the tactical workload is the Nested Join.
When designing your physical data model, you should consider a design which allows the usage of Nested Joins.
Business users often need to retrieve information based on the natural key of a table and not the surrogate keys. It‘s always a good idea to think about keeping the natural keys in your physical data model when dealing with tactical workload!
Here is an example of a physical data model which allows the usage of Nested Joins.
CREATE TABLE Sales
ProductId INTEGER NOT NULL,
SalesId INTEGER NOT NULL
) PRIMARY INDEX (ProductId );
CREATE TABLE Products
ProductId INTEGER NOT NULL,
) PRIMARY INDEX (ProductId )
UNIQUE INDEX (ProductCode);
In our example above, the data is accessed via the natural key “ProductCode” and not the surrogate key (ProductId). Therefore, we use a USI to create a direct data access path via the natural key:
SELECT * FROM Products p INNER JOIN Sales s ON p.prod_id = s.prod_id WHERE ProductCode = ‚Private‘
As you can see, the tactical query is resolved with a Nested Join and the USI on column ProductCode.
Forcing all Rows to one AMP
Another interesting tool we have to execute tactical queries against little tables is to force all rows to one AMP, by adding a column to the small table, populating it with just one value. This column has to be the primary index of the table.
Populating the primary index column with just one value will force all table rows to the same AMP, and if the table is sufficiently small, even into the same data block.
If everything works like expected, this results in a singe-AMP retrieve step instead of an All-AMP scan.
If there are many concurrent tactical queries executed against the table at the same time, the entire table will stay in memory, avoiding any access to the disk drive.
If you think about using this approach for several tables, use different values for the primary index of the “dummy” column, to avoid an overload of a single AMP.
Recently, I was involved in an optimization project for a web-frontend tool with Teradata as the backend database.
During the development process, nobody had considered that the queries executed against the database are tactical workload. Most of the workload the tool generated was full table scans.
The web tool had been tested on a development system, but when it went into production, and dozens of people started to work in parallel, together with all the other workload, performance severely. The real problem was not the size of the involved tables, as all tables were quite small.
The real problem was a mix of end-user expectations when navigating in the web frontend, and the setup of the workload management, which did not classify the tool queries as tactical workload.
Users expected an average response time of just a few seconds, which could not be achieved with full table scans on a loaded system (I understood the users. Who wants to wait 30 seconds to save the information of just one client?)
Run times for the main activities were at best 1-2 seconds but increased to a couple of minutes on a busy system. Furthermore, workload delays added to the overall response times.
I began to add unique secondary indexes, but as the tables were tiny, the optimizer decided that a full table scan of the base table would be cheaper than the indexed access.
Finally, it turned out that the usage of join indexes was the solution. As the tool required different access paths to the data, I created several 1:1 join indexes per table, for each of the access paths, only altering the primary index of the join index.
When adding join indexes, we have to keep an eye on the impact of an update, insert and delete statement. In my case, it was no problem, as tables were small.
What I experienced was another issue: The huge number of join indexes caused a lot of deadlocks.
There are several ways to overcome this problem. We can, for example, serialize the user session activities, by using a “serialization” table. Each session has to get a write lock for each activity on this”serialization” table before it can apply any changes to the data.
In my project, I was able to minimize deadlocks by rewriting the critical pieces of code, reducing the number of changes (updates, deletes, inserts) done on the tables.
The last hint: Don't forget to collect statistics on the join indexes. Otherwise, they may not be used!
Teradata Golden Tuning Tipps 2017 – Take your Skills to the next Level!
6 Golden Rules for Teradata SQL Tuning
The Teradata Join Index Guide – We leave no Questions unanswered!
Dramatically improve Performance with the Teradata Nested Join
In our previous articles about Teradata workload management, we covered the basics about how SQL requests are classified into workloads. We talked about throttles, dynamic workload management and the priority scheduler.
This article is covering utility workload. Utility jobs are user requests which are not implementing the SQL protocol. Currently, this is the following ones:
Above mentioned tasks are implementing their protocols. The difference between SQL workload and utility jobs is that utility workload is highly specialized and optimized. Therefore, the more advanced workload classification criteria's, such as “query characteristics” (product joins, ALL-AMP, etc.), are not available (and not needed). Still, the workload management (TASM, TIWM) offers additional features unique to this kind of workload.
Teradata implements below protocols to cover utility workload:
For each utility protocol, several implementations exist (for example, the FastLoad protocol is implemented by the Fastload Utility, the TPT Load operator, and the JDBC Fastload).
Non-Teradata utilities (3rd party) are allowed to apply these protocols (via TPT), but individual workload management features may be missing.
During execution, all utilities are logging on at least one SQL session and several utility sessions (with the exception of BAR, which is not making use of any utility session at all). The mandatory SQL session, called the control session, is used to execute all SQL statements belonging to the utility.
Load and unload utilities may use a second SQL session (the auxiliary session) for the maintenance of a restart log.
Utility resource usage can be managed by:
Concurrency limits are implemented by the usage of throttles. They can be defined on system level and workload level.
Additionally, Teradata workload management implements hard limits which never can be exceeded by the user-defined utility or workload throttles:
Limit for Fastloads + Multiloads + FastExports: 60
Limit for Fastloads + Multiloads: 30
Limit for Fastloads: 30
Limit for Multiloads: 30
Limit for Fastexports: 60
Limit for Backup & Restore: 350
Although the overall hard limit for load & unload utilities together is 60, there exist even more strict limits; For example, it’s not possible to execute more than 30 Fastloads at one point in time, even if the total limit of 60 loads & unload utilities is not reached.
The hard limits always lead to a delay, if there is no user-defined utility or workload throttle with the reject option defined.
The hard limits for concurrency are accompanied by a limit for AMP worker tasks. Only 60% of the available AWT’s can be used by utilities at one point in time.
Even if concurrency limits are not reached, the AWT limits may cause that a utility job is delayed or rejected (important: AWT limits are only checked for FastLoad, Multiload, and FastExport without Spool).
User-defined utility throttles are used to manage resources on a system level. You can define limits such as “Allow 15 Fastloads at one point in time”. These throttles always have to be set in the range of the hard limits we mentioned above.
If you need to arrange your utility workload on a more detailed level, utility workloads which are using their throttles can be defined.
This is achieved by adding at least “utility” as classification criteria to your workload. Additional criteria’s can be used to further detail the workload, such as “request source,” “target” and “query band.”
Like mentioned earlier, SQL protocol specific criteria’s, such as “query characteristics,” are not available for classification.
Limiting the number of utility sessions is another way of managing system resources.
In the same way, as default concurrency limits are defined for utilities, the Teradata workload management is taking care of sessions automatically, if user-defined session limits do not overwrite the limits.
The workload management is distinguishing between small, medium and large amounts of data. By default, medium sized data is expected.
If you are loading small or large amounts of data, the default limits can be adjusted by using the query band below:
SET QUERY_BAND = “UtilityDataSize=SMALL/MEDIUM/LARGE;”;
These are the default rules for utility session limits:
Fastload and Multiload:
Medium Sized Data:
IF (“Number of AMPs” <= 20) THEN “Number of AMPs.”
ELSE MIN(20 + “Number of AMPs” / 20), 100)
Small Sized Data:
“Medium Sized Data” * 0.5
Large Sized Data:
MIN(“Medium Sized Data” * 1.5, “Number of AMPs”)
Medium Sized Data:
IF (“Number of AMPs” <= 4 THEN “Number of AMPs” ELSE 4
Small Sized Data:
“Medium Sized Data” * 0.5
Large Sized Data:
MIN(“Medium Sized Data” * 1.5, 2 * “Number of AMPs”)
If the default session limits are not sufficient, user workload can be defined.
Classification criteria’s for user-defined workload are:
You should always at first design your utility rules on the system level to ensure a healthy behavior of your utility jobs. Specific workloads should only be considered to cover unique situations and requirements.
Dynamic workload management features are only available on systems which are using Teradata Active System Management (TASM). Teradata Integrated Workload Management (TIWM) doesn't offer most of these features.
While TIWM solely relies in static workload definitions (with a few exceptions we will mention later), TASM offers features which allow to adapt the workload management dynamically to certain system conditions.
Dynamic changes are always based on events. TASM distinguishes between time based events (a defined time window is entered or left) and system specific events, such as the event that the number of AMP worker tasks (AWTs) in use surpasses a defined limit.
System specific events are called “health conditions” in TASM, time based events are called the “planned environments”.
The differentiation makes sense: Time based events are always planned. You define a time window and as soon as a certain time is reached, it's being activated. Health conditions on the other hand are not planned or even plannable in advance. Depending on the system load, a health condition is activated, but you can't know in advance when (or if) this will happen.
TASM combines these two type of events in the so-called state matrix. Each combination of a health condition and planned event leads to a certain state in the state matrix. Each state represents a set of workload management rules, such as throttles (request limits, session limits) and priorities,
Of course, as the workload designer, it's up to you how many planned events and health conditions you define. In the simplest case you will have one planned environment (the default one, which is called “Always”) and one health condition (called “Normal”).
A commonly used approach is to define at least two planned environments, one covering the day time (protecting reporting users and applications) and a second one which is giving higher priority to the nightly batch loads. Below you can see a state matrix with two planned environments (“day” and “night”) and two health conditions (“1” and “2”). Each orange box presents a state which is characterized by a certain set of workload rules (in our example, we named them in a way to make it easy to identify their purpose).
TASM is checking at regular intervals if relevant events took place. Initially, TASM checks for health conditions. In a subsequent step it checks for a switch to another planned environment. During each check, one or both type of events could have happened. If at least one event happened, the workload management rules of the related state will be used.
If more than one planned environment matches during the check, the rightmost in the state matrix will be used. Similarly, if more than one health conditions are matching, the one nearest to the bottom of the state matrix is used.
The DBC.DBQLOGTBL table makes it easy to identify for each request in which planned environment and health condition it was executed. This is a mighty tool for the experienced Teradata tuning expert as it allows to analyze performance issues, to identify wrongly assigned queries, and much more.
The following columns carry the relevant information:
OpEnvId = Planned Environment
SysConId = Health Condition
Another useful column is LastStateChange, as it shows the timestamp when the planned environment for the request was activated.
Throttles are used to limit the number of concurrent sessions, requests or utilities. We use throttles in Teradata workload management to protect critical resources – such as memory, AMP worker tasks, and CPU – from being exhausted. The usage of throttles helps to reduce resource contention on systems with a high level of concurrency, usually increasing the total system performance.
Request throttling limits the number of concurrent requests (queries). The defined request limit is ensured at any point in time. The workload management uses a simple counter to keep track of the number of executing requests. If the limit is reached and additional queries are ready for execution, they are added to the so-called delay queue (this is the most used application) or rejected immediately. As soon as the request counter falls below the limit, the first query waiting in the queue (FIFO) will be executed. Requests being executing never will be moved back to the delay queue.
Request throttles can be defined per workload or on the system level.
Workload throttles enforce the limits for the requests executing on behalf of the workload. They are applied after requests have been classified into the workload.
System throttles enforce limits on the system level. Without any further request classification, the limits will be implemented to each and every single query. Nevertheless, all the classification criteria's we explained in the first part of our workload management series (Teradata Workload Management – The Basics) can be used to restrict limits to a group of queries with common characteristics (source, target, queryband, etc.).
System throttles and workload throttles can be active for a request at the same time.
Session limits are used to restrict the number of parallel sessions allowed at one point in time. In contrast to request throttles, enabling the delaying of requests, sessions cannot be delayed. If the session limit is reached, no more sessions can be logged on (logins will be rejected).
Session limits can be applied for the complete system or classification targets (user, account, IP, etc.). Additional classification criteria's, apart from targets, can't be used.
It's much easier to deal with workload throttles than with system throttles. The impact of workload throttles is overseen able and limited to a group of queries. Workloads collect requests with common characteristics; Workload throttles allow to define limits for each group of requests, and the impact of these limits can be easily identified.
Most times, the same limitations can be achieved much easier with workload throttles. System throttles are much more complicated to implement from a classification point of view.
Nevertheless, system throttles have their place in workload management. Usually, they are used to manage situations which are not related to specific workloads but the overall system condition, such as periods of system recovery or critical backup / restore activities. These are examples when it may make sense to limit requests on the system level.
However, you are defining your throttles: Never apply throttling to you tactical workload (the reason should be obvious).
The query log keeps track about all delays caused by either workload or system throttles, helping to identify bottleneck situations and other performance related issues on your system. Two columns are showing us all information we need:
WDDelayTime and DelayTime
WDDelayTime demonstrates the number of seconds a request was a delay in a workload queue. This column is only referring to workload delays.
DelayTime is the total number of seconds the request was delayed, mainly the sum of workload delays and system delays. Unfortunately, as workload delays and system delays can be overlapping for the same request, we can't just subtract WDDelayTime from DelayTime to get the system delays.
There is no way to calculate the system delays if workload delays and system delays happened.
What we know:
All workload management related columns are available as well in the DBC view QryLogTDWM.
Everyone who has some experience with Teradata tuning will have heard the term “workload.” This article will deal with the definition of this term and explain in detail what it is and why it's helpful to map each request (query or load operation) to a workload.
A workload is a group of requests with common characteristics. The grouping of requests into workloads has some advantages over the handling of single requests.
It's more efficient to manage groups of requests than to handle each request individually, as common request characteristics allow to apply standard settings and rules to the group at once.
Grouping requests into workloads give us a better overview of who is using the system and how many resources are used. Armed with this knowledge is is much easier to assign priorities to performance tuning tasks based on workloads (for example, by assigning the highest priority to tune activities for your critical, tactical workload).
Another important concept behind workloads is the assignment of resources (CPU, IOs) and preferences (for example, the frequency and duration of CPU usage) on a group level. Business needs usually define how critical a joint group of requests are. By putting these requests into a workload, common priorities and resource shares can be assigned. The Teradata priority scheduler is taking care of resources and priorities assigned to each workload.
Different releases of Teradata systems use different implementations of priority scheduling. While the Teradata 14 priority scheduler is closely connected with the SLES 11 Linux scheduler and its priority hierarchies, previous Teradata releases running on SLES 10 implement a different way of priority scheduling.
Nevertheless, these details are not important to understand the concept of workloads and workload classification. We will talk about the Teradata priority scheduler in one of our next articles related to workload management.
Whenever a request matches all the characteristics defined for an individual workload, it will be assigned to this workload and the workload specific rules and settings will be applied to the request. This process is called workload classification and the central concept of adding resources and limits to requests.
All workloads are managed in the ViewPoint “Workload Designer.”
Workload characteristics can be any out of the following types below (each characteristic on its own or combined with other features can be used to classify requests into workloads), but be aware that the lists below may not be complete as each release of Teradata adds new features.
1. Request source:
Request source defines where a request was initiated. The request source(s) can be any of the following:
A query will be classified into a workload if a request source matches or does not match the request sources which are defined for this workload. Several request sources can be set for a workload:
Username = ‘ETLUSER1'
Client IP address = 192.168.0.54
If several request sources are defined for a workload they are AND combined (in above example this means that the login user has to be ‘ETLUSER1' and must connect from the client with IP address 192.168.0.54) to be classified into the considered workload. We are assuming that both request sources are used to include them into the workload.
As a general rule: Up to Teradata 13.00, several characteristics have been combined by OR. Since Teradata 13.10 characteristics are coupled with the “AND.” This rule is valid for all type of features (not only for request sources).
Hint: All of these request sources can be found in the DBC.DBQLOGTBL table.
2. Request target:
Each request will be executed against one or several database objects, such as databases, tables, view, macros, stored procedures, and UDFs.
By defining request targets in your workload characteristics, it's possible to assign queries to a workload if certain database objects are accessed.
Since Teradata 14, it's possible to define additional subcriteria for each request target, considering in which way the objects are used. But subcriteria are not allowed for UDFs.
Some examples for subcriteria are:
– Add query or exclude query from workload if the database object is accessed with a full table scan (FTS).
– Add query to workload if the database object is accessed with a particular join type (all or no joins, all or no product joins, all or no unconstrained product
– Add a query to workload if a minimum or a maximum number of rows for each step is estimated.
Several request targets can be combined in a workload definition (AND combined). If subcriteria are used, they are all AND combined.
3. Query Characteristics:
By detailed analysis of each request, a more detailed workload classification is possible. These are the query attributes which are used to classify a query to a workload:
– Join Behaviour: product join, unconstraint product join, no product join, no unconstraint product join, any join, no join.
– Type of Statement: DDL, DML, SQL Select, Collect Statistics.
– AMP Limits: Defines if All-AMP operations should be included or excluded into/from this workload.
– Estimated Processing Time: Adds a query to the workload if the estimated processing time is between the minimum and maximum estimation defined.
– Step Row Count: Adds a query to the workload if the result rows of each step are between the minimum and maximum values specified.
– Final Row Count: Adds a query to the workload if the result rows of the final result set are between the minimum and maximum values defined.
– Full Table Scan: Adds or excludes queries doing a full table scan (FTS).
– Estimated memory usage: Include or exclude request if it exceeds a defined memory usage
4. Utility Usage as Classification Criteria
The workload management offers the possibility to classify load tasks (Fastload,Multiload, , FastExport) and backup/restore tasks into separate workloads.
5. The Queryband as Classification Criteria
By adding a queryband to your session or transaction, this queryband can be used to assign or exclude queries with this queryband to/from a workload.
You will need these classification criter if requests are coming from a single or common request source but you need to classify them into different workloads.
Currently (Teradata 14.10) up to 250 workloads can be defined. Requests which do not match any workload will be assigned to the WD-Default workload, which is available on each Teradata system and cannot be deleted (and it counts against the 250 workload limit).
Start your classification on request source level. If your users are logging on with a standard user, consider adding queryband classification. Querybands will help you to classify your workload on a very detailed level. You should only add additional classification criteria (request target, query characteristics) if needed.
Keep in mind, that an enormous amount of classification rules (include lists, exclude lists), makes maintenance difficult. Furthermore, a lot of standards will be hard to understand and confusing.
Limit as well the number of workloads. All workloads are interacting with each other. A huge number of workloads makes it difficult to understand the overall impact of each workload on the whole system performance.This was the first article of the DWHPro workload management series. Feel free to comment and inform us about errors or things we should change.