Monitoring The Flow Control State on Teradata

Roland Wenzlofsky

August 3, 2015

minutes reading time


The DBC.ResUsageSAWT table is our primary tool to analyze the workload regarding the flow control state.
Before going deeper into details, we will brief about the main characteristics of table DBC.ResUsageSAWT:

Many of the measures in this table represent the status at the end of a 10-minute snapshot interval.
As the database administrator may have changed the interval,  it could be different on your system. The column “secs” shows the defined interval duration in seconds.
In our example, it’s 600 seconds(or 10 minutes).

The detail level is one row per Date, Time (every 10 minutes), Node, and AMP.

Let me briefly go over all essential columns and their content:

TheDate: The snapshot date.
TheTime: Represents the time when the snapshot was taken.

Together TheDate and TheTime show the status of the AMP worker tasks at the end of each 10 minutes interval.

Secs: Defines the duration of a snapshot interval

VPRID: Identifies the AMP

MailBoxDepth: This shows the number of messages in the AMP queue at the end of the snapshot interval.
If no AMP worker task is available to take over additional work, the task will be queued and show up in MailBoxDepth.

FlowControlled: This shows if the AMP is in a flow control state at the end of the snapshot interval.

FlowCtlCnt: This shows the number of times the AMP entered the flow control state during the snapshot interval.

InUseMax: Shows the maximum number of AMP worker tasks that have been active simultaneously during the snapshot interval.

Available: Shows the number of free AMP worker tasks at the end of the snapshot interval.

AvailableMin: Shows the minimum number of free AMP worker tasks during the snapshot interval.

AWTLimit: This shows the number of total AWTs in the system.

WorkTypeMax00, …, WorkTypeMax15: Shows the maximum number of AMP worker tasks per work type during the snapshot interval.
I renamed the columns better to understand the type of workload each column presents.

While a couple of additional columns are available, they usually will not be beneficial for our purpose.

Some of the columns we are showing here only exist in Teradata 14.10 and later. Some of the columns are populated differently before release 14.10.

In our test scenario we will analyze the worst flow control situation we can find in DBC.ResUsageSAWT. We are taking a closer look at the AMP which spent the longest time in flow control state on a certain day:

SELECT
TheDate
,TheTime
,Secs
,VPRID as TheAMP
,MailBoxDepth
,FlowCOntrolled
,FlowCtlCnt
,InUseMax
,Available
,AvailableMin
,AWTLimit
,WorkTypeMax00 AS DispatcherStep
,WorkTypeMax01 AS Spawned_Level1
,WorkTypeMax02 AS Spawned_Level2
,WorkTypeMax03 AS InternalWork
,WorkTypeMax04 AS Recovery
,WorkTypeMax08 AS ExpeditedDispatcherStep
,WorkTypeMax09 AS ExpeditedSpawned_Level1
,WorkTypeMax10 AS ExpeditedSpawned_Level2
,WorkTypeMax12 AS AbortStep
,WorkTypeMax13 AS SpawnedWorkAbort
,WorkTypeMax14 AS UrgentInternalWork
,WorkTypeMax15 AS MostUrgentInterbalWork
FROM DBC.ResUsageSAWT
WHERE (TheDate,NodeId,VPRID) IN
(SELECT TheDate,NodeId,VPRID FROM DBC.ResUsageSAWT
WHERE FlowCtlTime = (SELECT MAX(FlowCtlTime) FROM DBC.ResUsageSAWT )
);
The most critical measure is “InUseMax”, which gives a clear picture of the system load. Although it is not distinguishing between different types of work, it is the only measure capable of showing us the maximum number of AMP worker tasks in use during the considered snapshot interval.

It’s impossible to sum up, WorkTypeMax00 to WorkTypeMax15, because the maximum numbers are collected independently during each logging interval. While you can use these columns to analyze the behavior of a specific workload type, such as abort steps, they cannot be used to get the correct picture regarding the total system load!

By checking the the InUseMax measure, you will see how the total system workload is consuming AMP worker tasks, which means: how loaded your system is. Of course, InUseMax will always be <= AWTLimit.

While the measure InUseMax only shows the “highest usage” of AMP worker tasks, we have two additional measures which are helping us to get a more detailed picture: Available and AvailableMin.

Both measures can be used to analyze what happened during and at the end of the snapshot period.

Available shows the number of unused AMP worker tasks at the end of the snapshot interval. A consecutive number of snapshots having Available = 0 means that your system ran out of AMP worker tasks over a longer time frame.

Additional insight gives the column AvailableMin, which measures the minimum number of free AMP worker tasks during the snapshot interval.

As you can imagine, if Available and AvailableMin are equal to 0 over a consecutive number of snapshots, the system is fully loaded during this time.
While the previously described measures only showed the utilization of AMP worker tasks, we have a couple of measures directly related to flow control: MailBoxDepth, FlowControlled, FlowCtlCnt, FlowCtlTime.

High AWT usage does not necessarily mean the system is in flow control. A flow control state is initiated if all AMP worker tasks are in use and the message queues are filled.
While often MailBoxDepth is > 0 if the AMP is in a flow control state, it’s not always like this because this measure presents only the message queue size at the end of each snapshot interval. It could be that the AMP was in a flow control state during the snapshot interval, but the queue was empty at the end. Keep this in mind.

FlowControlled is just an indicator showing if the AMP is in flow control at the end of the snapshot interval. This does not mean it hasn’t been flow controlled during the snapshot interval. Using this measure on its own can be misleading!

FlowCtlCnt indicates how often the AMP entered the flow control state during the snapshot interval.

FlowCtlTime is the total time the AMP was in flow control during the snapshot interval (in milliseconds)

Using the above measure together will give you a good insight into a load of your system.
By summing up the MaxInUse column, you will already have a good overview of your system load. This can be a good starting point to go more into the details.

MaxUsage

In my example below, the system has a maximum of 120 AWTs per AMP (default is 80):

  • Avatar
    Tomas Gorner says:

    Hi Roland,

    Very nice summary, straight to the point.
    There are 2 typos though, that can be a bit misleading:

    – “Available shows the number of AMP worker tasks in use at the end of the snapshot interval.”
    It’s rather: “Available shows the number of AMP worker tasks available at the end of the snapshot interval.”.

    – “… AvailableMin, which is measuring the minimum number of AMP worker tasks having been used at one point in time during the snapshot interval.”
    It’s rather: “… AvailableMin, which is measuring the minimum number of AMP worker tasks having been available at one point…”

    You might want to correct that.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >