Monitoring The Flow Control State on Teradata
The DBC.ResUsageSAWT table is our main tool to analyze the workload when it comes to flow control state.
Before going deeper into details, we will talk briefly about the main characteristics of table DBC.ResUsageSAWT:
Many of the measures of this table represent the status at the end of a 10 minute snapshot interval.
As the database administrator may have changed the interval, it could be different on your system. The column “secs” shows the defined interval duration in seconds.
In our example it's 600 seconds(or 10 minutes).
Let me briefly go over all important columns and their content:
TheDate: The snapshot date.
TheTime: Represents the time when the snapshot was taken.
Together TheDate and TheTime show the status of the AMP worker tasks at the end of each 10 minutes interval.
Secs: Defines the duration of a snapshot interval
VPRID: Identifies the AMP
MailBoxDepth: Shows the number of messages in the AMP queue, at the end of the snapshot interval.
If no AMP worker task is available to take over additional work, the task it will be queued and show up in MailBoxDepth.
FlowControlled: Shows if the AMP is in flow control state at the end of the snapshot interval.
FlowCtlCnt: Shows the number of times the AMP entered flow control state during the snapshot interval.
InUseMax: Shows the maximum number of AMP worker tasks which have been active at the same time during the snapshot interval.
Available: Shows the number of free AMP worker tasks at the end of the snapshot interval.
AvailableMin: Shows the minimum number of free AMP worker tasks during the snapshot interval.
AWTLimit: Shows the number of total AWTs in the system.
WorkTypeMax00, …, WorkTypeMax15: Shows the maximum number AMP worker tasks per work type during the snapshot interval.
I renamed the columns to have a better picture about the type of workload each column presents.
While there are a couple of additional columns available, they usually will not be very helpful for our purpose.
Some of the columns we are showing here only exist in Teradata 14.10 and later. Some of the columns are populated in a different way before release 14.10.
In our test scenario we will analyze the worst flow control situation we can find in DBC.ResUsageSAWT. We are taking a closer look at the AMP which spent the longest time in flow control state on a certain day:
,VPRID as TheAMP
,WorkTypeMax00 AS DispatcherStep
,WorkTypeMax01 AS Spawned_Level1
,WorkTypeMax02 AS Spawned_Level2
,WorkTypeMax03 AS InternalWork
,WorkTypeMax04 AS Recovery
,WorkTypeMax08 AS ExpeditedDispatcherStep
,WorkTypeMax09 AS ExpeditedSpawned_Level1
,WorkTypeMax10 AS ExpeditedSpawned_Level2
,WorkTypeMax12 AS AbortStep
,WorkTypeMax13 AS SpawnedWorkAbort
,WorkTypeMax14 AS UrgentInternalWork
,WorkTypeMax15 AS MostUrgentInterbalWork
WHERE (TheDate,NodeId,VPRID) IN
(SELECT TheDate,NodeId,VPRID FROM DBC.ResUsageSAWT
WHERE FlowCtlTime = (SELECT MAX(FlowCtlTime) FROM DBC.ResUsageSAWT )
The most important measure is “InUseMax” as it gives a clear picture about the system load. Although it is not distinguishing between different types of work, it is the only measure which is capable of showing us the actual maximum number of AMP worker tasks in use during the considered snapshot interval.
It’s not possible to sum up WorkTypeMax00 to WorkTypeMax15, because the maximum numbers are collected independently during each logging interval. While you can use these columns to analyze the behavior of a specific workload type, such as abort steps, they cannot be used to get the correct picture regarding the total system load!
By checking the the InUseMax measure, you will see how the total system workload is consuming AMP worker tasks, which means: how loaded your system is. Of course, InUseMax will always be <= AWTLimit.
While the measure InUseMax only shows the “highest usage” of AMP worker tasks, we have two additional measures which are helping us to get a more detailed picture: Available and AvailableMin.
Both measures can be used to analyze what happened during and at the end of the snapshot period.
Available shows the number of unused AMP worker tasks at the end of the snapshot interval. A consecutive number of snapshots having Available = 0 means that your system ran out of AMP worker tasks over a longer time frame.
Additional insight gives the column AvailableMin, which is measuring the minimum number of free AMP worker tasks during the snapshot interval.
As you can imagine, if Available and AvailableMin are equal to 0 over a consecutive number of snapshots, the system is fully loaded during this time.
While the previously described measures only showed the utilization of AMP worker tasks, we have a couple of measures available directly related to flow control: MailBoxDepth, FlowControlled, FlowCtlCnt, FlowCtlTime.
High AWT usage does not necessarily mean that the system is in flow control. Only if all AMP worker tasks are in use and the message queues are filled, flow control state is initiated.
While often MailBoxDepth is > 0 if the AMP is in flow control state, it’s not always like this because this measure presents only the size of the message queue at the end of each snapshot interval. It could be, that the AMP was in flow control state during the snapshot interval, but the queue is empty at the end of the interval. Keep this in mind.
FlowControlled is just an indicator showing if the AMP is in flow control at the end of the snapshot interval. This does not mean that it hasn’t been flow controlled during the snapshot interval. Using this measure on its own can be misleading!
FlowCtlCnt indicates how often the AMP entered the flow control state during the snapshot interval.
FlowCtlTime is the total time the AMP was in flow control during the snapshot interval (in milliseconds)
Using above measure together will give you good insight into the load of your system.
By summing up the MaxInUse column, you will already have a good overview about your system load. This can be a good starting point to go more into the details.