Teradata Workload Management – The Priority Scheduler

Roland Wenzlofsky

August 19, 2015

minutes reading time


In our previous article, we defined the term workload and described how the Teradata workload management classifies requests (queries) into workloads. We mentioned shortly, that resource allocation (CPU seconds and IOs) is one of the main reasons why we are grouping requests.

If you are new to workload management I would highly recommend that you read our first article of the workload management series:
Teradata Workload Management – The Basics

In this paper, we will take a closer look at the Teradata priority scheduler and how resources are distributed across workloads and requests. The priority scheduler is the software which is responsible for assigning available resources to active requests.

Since Teradata 14, the Teradata priority scheduler is closely linked to the Linux priority scheduler of SLES11, the so-called “completely fair scheduler” (CFS). Details about how the CFS balances the resources will be answered in one of our next articles (How many CPU seconds will a task get? How often?).

We will not discuss prior versions of the Teradata priority scheduler for a simple reason: The share of Teradata 13 sites running on SLES 10 is rapidly decreasing, and this information would be obsolete soon.

The Priority Scheduler Hierarchy

The CFS implements priorities using hierarchies. The Teradata priority scheduler is built on top of the CFS, and therefore quite similar. Below you can see the graphic of a typical Teradata 14 priority scheduler hierarchy:

Teradata Workload Management

All modern Teradata systems have some workload management. Currently, this is either Teradata Active Workload Management (TASM) or Teradata Integrated Workload Management (TIWM). Without a doubt, TASM is more advanced, offering a couple of more features.

Our graphical representation above shows the differences between TASM and TIWM when it comes to priority scheduler hierarchies (I highlighted the differences in Orange): TIWM only offers one virtual partition and does not support SLG workload tiers.

Resource Allocation Rules

System resources are allocated from the top of the hierarchy to the bottom. A very simple and elegant solution.

The simple rule of thumb is: The most important workload should be assigned to the top of the hierarchy, the least critical workload to the bottom.

Teradata workload management defines three primary levels: Tactical, SLG (TASM only) and Timeshare. These are the three levels were you define your workloads. Each of these level implements some particular features, and we will soon talk about each of these levels in detail.

Each level can consume all the resources it needs to execute its tasks, but to avoid situations where higher levels consume all resources and lower levels are completely cut off, a safety mechanism exists:

The “remaining” workload on the tactical tier and the SLG tiers – it’s a system-managed workload definition, and reserves at least 5% of the resources on tactical and SLG tiers for lower levels in the hierarchy. These “remaining” workload definitions are connecting the following levels and allow available resources to flow down one level in the hierarchy.

Depending on the setup and the actual load situation on the system, the resources for the “remaining” dummy workload can be much higher (and this is highly recommended).

Unused resources from tactical and SLG tiers will be passed down to the lower levels of the hierarchy; Resources are flowing from the top to the bottom.

It’s advised to assign your most critical requests to a workload that is located on top of the hierarchy (most probably in the tactical tier).
To avoid situations where lower levels are completely cut off from resources, because all of them were used at a higher level, a safety mechanism exists:

The different Levels of the Priority Hierarchy

The entire priority hierarchy consists of these levels:

  • TDAT
  • User
  • One or several virtual partitions (TIWM: always one)
  • A tactical workload tier
  • Between 0 and n SLG workload tiers (TASM only)
  • One timeshare tier

The TDAT Priority Level, User and System Tasks

On top of the hierarchy, there is always TDAT, immediately followed by USER, DFLT (Default), and SYS (System). Teradata system tasks are running on DFLT and SYS; user-initiated tasks below USER. SYS and DFLT tasks always have their resource requirements satisfied before any user tasks will receive resources. Both, SYS and DFLT are executing critical system-related tasks.

The USER level groups all tasks running on behalf of database users, i.e. no internal tasks.

Splitting the System -Teradata Virtual Partitions

Virtual Partitions are available on the Teradata system running at least on SLES 11, as this feature is linked to the concept of SLES11 priority hierarchies. Only systems with Teradata 14 or higher have access to this feature.

The idea behind virtual partitions is to split the system resources on the highest level.

Virtual partitions are often used in situations where several business owners share a common Teradata system. Each owner will be able to consume its percentage of the system resources completely.
Noteworthy, the assigned share is not exclusively reserved: If one of the virtual partitions is not completely wasting its assigned resources at one point in time, other virtual partitions will be allowed to use them in the meanwhile.

As mentioned earlier in this article, TIWM does not allow system splitting. It implements exactly one virtual partition. You need TASM to use this feature.

To each virtual partition, several workloads can be assigned.

The Tactical Workload Tier

The next lower level in the priority hierarchy is used for a tactical workload. Tactical workloads can consume as many resources as they need (apart from the 5% which are always reserved for the “remaining” workload).

Workload defined at the tactical level has another advantage, which is unique to this level: Tasks are automatically expedited, and they can make use of reserved AMP worker tasks (AWTs). Furthermore, tactical workload tasks can quickly interrupt other tasks and take over the CPU (the suspended tasks will be continued afterward). While this feature itself is not unique to tactical workloads, only tactical workloads have access to the CPU immediately. Tasks from other levels usually only get access to the CPU by interrupting running tasks if the task to be interrupted consumed much more CPU than the one which is interrupting it.

By assigning long-running and resource-intense queries to tactical workloads, you could completely cut off the workloads defined on lower levels from resources.

Therefore, you should only assign “real” tactical requests to the tactical tier: Single-AMP or group-AMP operations.

The designers of the Teradata workload management took care of this risk and added a security mechanism to avoid such situations: Tactical workload exceptions.

Tactical workload exceptions allow moving requests which are running on the tactical tier to be transferred to a workload located further down in the hierarchy. This exception mechanism cannot be turned off; it is executed automatically. It will happen if a certain amount of CPU seconds was consumed by request (adjustable).

You should monitor this exception to identify requests which are regularly causing exceptions. They should be moved down to workload located lower in the priority hierarchy.

The SLG Workload Tier(s)

Below the tactical tier, you will find the SLG tiers (TASM) or immediately the timeshare tier (TIWM).

Each SLG tier workload carries a relative weight. The relative weight determines how the workload can consume many percents of the resources which are flowing into the SLG tier (from the level above in the hierarchy) and defines the top resource limit for the workload (with one exception we will explain soon). The requests which are executed in this workload may consume fewer resources.

In the example graphic below, we assign 20% of “Tier 1” resources to “Workload 1” and 30% to “Workload 2”.

If not all workloads on an SLG tier can use their assigned resources, the remaining resources will be equally distributed across all other workloads in need. If there remain resources, they will be allocated to the “remaining” workload and passed down in the hierarchy.

At least 50% of the resources defined in the “remaining” workload will always flow down to the SLG tier below tier 1 (could be another SLG level or timeshare, depending on the setup). If tier 1 can’t use 50% of its assigned resources (20% + 30%), and resources cannot be reassigned between workloads on tier 1, all unused resources will be added to the 50% of the “remaining” workload.

If for example, not a single request is active on tier 1, 100% of tier 1’s resources will be passed to the level below it, such as another SLG tier or already the timeshare tier.
It is important to understand that we are talking about relative weights. For example, “Workload 1, Tier 1” will be able to use up to 20% of the resources which have been received by the previous tier, not from the complete Teradata system!

Be further aware of the fact, that all active requests of a workload share the relative weight. If one request is active on “Workload 1 Tier1” it can use the available 20%, but if two requests are active, each of them can use up to 10%, if together they consume the available total 20% share. Concurrency within a workload makes a difference in resources given to each task!

The Teradata Priority Scheduler Hierarchy:

The alert reader will have recognized, that there are no relative weights used in the tactical tier. All requests running in workloads located in the tactical tier have access to all available resources, without any limitation. The idea is that only tactical requests are found in the tactical tier.

One last warning: Keep the number of SLG tiers as little as possible. The more SLG tiers you create, the less stable will be the run time behavior of your requests, as apparently, the requests on lower levels of the hierarchy depend on what’s “leftover” from higher hierarchy levels.

Furthermore, don’t get fooled by the idea to assign as much as possible of the resources to the top levels (tactical, highest SLG levels). While it sounds tempting to ensure this way sufficient resources for your critical workload, keep in mind that all resources which are reaching the timeshare tier and are unused will be anyway made available to the workload in need.

The Timeshare Tier

The Timeshare tier presents the lowest level in the priority hierarchy. While the SLG tiers are only available in TASM (but don’t have to be used at all), the timeshare tier is available in TASM and TIWM.

The remaining resources from the lowest SLG tier (or the tactical tier, if using TIWM or not defining SLG tiers in TASM) are made available in the timeshare tier. Resource allocation is implemented in a different way for workload defined as timeshare workload:

The timeshare tier sets four resource allocation weights:

Top, High, Medium, and Low.
Between these four levels, fixed ratios are defined. No relative weights per workload are used like on the SLG tier(s):

  • Each request in a workload of group “Top” can consume eight times more than any request of a workload of group “Low.”
  • Each request in a workload of group “High” can consume four times more than any request of a workload of group “Low.”
  • Each request in a workload of group “Medium” can consume two times more than any request of a workload of group “Low.”

This rule is valid across all workloads defined on the timeshare tier.

While all requests of one workload on an SLG tier are competing for the relative resources assigned to exactly this workload, all timeshare requests of all defined workloads compete for the total resources entering the timeshare tier.

Each “Top” request will always get eight times more resources than each “Low” request, no matter how many requests are active in any of the four levels (top, high, medium, low). All workloads defined in the timeshare tier are sharing resources.

Terdata The Priority Scheduler Hierarchy

As soon as the remaining resources reach the timeshare tier there are two possibilities:

If workload defined as timeshare consumes all remaining resources, 100% of the system resources were consumed and distributed. But if the timeshare workloads don’t consume all remaining resources, they will be assigned to other workloads in need.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like