Tactical Workload Tuning on Teradata

7
1848
Tactical Workload

Teradata Tactical Workload Tuning

UPDATED Tactical workload tuning is a particular discipline on Teradata. Tactical workload requires a unique skill set and tools.

In this article, I will show you the skills needed to optimize tactical workload.

You will learn how to use the proper tools and the “tactical workload skill set.”

The Difference between Tactical and Decision Support Workload

Decision support workload typically involves full table scans (FTS) and all-row joins -usually between big tables.
Indexing (NUSI, USI, etc.) can improve performance, but more often all table rows are accessed.

Here is an example of a decision support query:

SELECT CustomerGroupName, SUM(Revenue)
FROM
Sales t01
INNER JOIN
CustomerGroup t02
ON
t01.GroupCd = t02.GroupCd
GROUP BY 1;

 

Teradata Tactical Workload Tuning

The tactical workload is characterized by single row access or the access to a limited number of rows. Here is an example for a tactical query:

SELECT *
FROM Sales
WHERE SalesId = 100;

Optimized tactical queries need indexed access paths, such as UPI, NUPI, NUSI, USI, Join Indexes or a single row partition. The primary goal is to have an execution plan with each step only delivering one or a few rows.

Teradata Tactical Workload Tuning

The tactical selection criteria of your query have to be applied as early as possible in the execution plan.
While this allows avoiding deadlocks entirely, serialization might not be useable in any case for performance reasons.
The solution I used was to rewrite the most executed statements, increasing the number of changes to the database. By keeping the number of changes small, I was able to prevent deadlocks.
With a proper query design, tactical query execution is stable, and performance is not decreasing if table cardinality is growing, while decision support queries usually become slower with a growing amount of data:

TacticalDSS

Tactical query performance does not improve with the number of Teradata nodes. Adding more nodes to your Teradata system will not have any impact on the performance of your single tactical query. Nevertheless, adding nodes to your system allows more tactical queries to be executed at the same time.

The performance of decision support queries improves with the number of nodes. Assuming a perfect data distribution (no skewing), doubling the number of nodes can speed up your query by a factor of two (nevertheless, this are pure numbers, no system is working 100% in parallel and without any skewing effects).

The Tactical Workload Toolkit

Besides optimizing the retrieve steps with indexed access to single rows (USI, UPI, NUSI, Join Index, etc.), we have to optimize the joins. Join optimization means to minimize the number of AMPs involved in the join operations.

For example, the execution plan of below query shows that the selected row is accessed from only one AMP (single-AMP join step):

SELECT * FROM Sales s INNER JOIN SalesDetails sd ON s.SalesId = sd.SalesId WHERE s.SalesId = 100; 1)

First, we do a single-AMP JOIN step from Indexing.t01 by way of the primary index “Sales.s.SalesId = 100″
with no residual conditions, which is joined to Sales.SalesDetails by way of the primary index “Sales.sd.SalesId = 100″
with a residual condition of (“Sales.SalesDetails.SalesId = 100″). Sales.Sales and Sales.SalesDetails
are joined using a merge join, with a join condition of (“Sales.Sales.SalesId = Sales.SalesDetails.SalesId”).
The result goes into Spool 1 (one-amp), which is built locally on that AMP.
The size of Spool 1 is estimated with low confidence to be 151 rows.

Above query will always perform identical (given the execution plan is not changing), and this is what we want to achieve in tactical workload tuning: A stable direct access to the data.

While above example was built around a merge join, the most suitable join type for the tactical workload is the Nested Join.

When designing your physical data model, you should consider a design which allows the usage of Nested Joins.

Business users often need to retrieve information based on the natural key of a table and not the surrogate keys. It‘s always a good idea to think about keeping the natural keys in your physical data model when dealing with tactical workload!

Here is an example of a physical data model which allows the usage of Nested Joins.

CREATE TABLE Sales
(
ProductId INTEGER NOT NULL,
SalesId INTEGER NOT NULL
) PRIMARY INDEX (ProductId );

CREATE TABLE Products
(
ProductId INTEGER NOT NULL,
ProductCode CHAR(10)
) PRIMARY INDEX (ProductId )
UNIQUE INDEX (ProductCode);

In our example above, the data is accessed via the natural key “ProductCode” and not the surrogate key (ProductId). Therefore, we use a USI to create a direct data access path via the natural key:

SELECT * FROM Products p INNER JOIN Sales s ON p.prod_id = s.prod_id WHERE ProductCode = ‚Private‘

1)First, we do a two-AMP JOIN step from Sales.Products by way of unique index # 4
“Sales.p.ProductCode = ‘Private’" with no residual conditions, which is joined to Indexing.s by
The way of unique index # 4 “Sales.s.ProductId = Sales.p.ProductId" with no residual conditions.
Sales.p and Sales.s are joined using a nested join, with a join condition of ("(1=1)").
The result goes into Spool 1 (one-amp), which is built locally on the AMPs.

As you can see, the tactical query is resolved with a Nested Join and the USI on column ProductCode.

Forcing all Rows to one AMP

Another interesting tool we have to execute tactical queries against little tables is to force all rows to one AMP, by adding a column to the small table, populating it with just one value. This column has to be the primary index of the table.

Populating the primary index column with just one value will force all table rows to the same AMP, and if the table is sufficiently small, even into the same data block.

If everything works like expected, this results in a singe-AMP retrieve step instead of an All-AMP scan.

If there are many concurrent tactical queries executed against the table at the same time, the entire table will stay in memory, avoiding any access to the disk drive.

If you think about using this approach for several tables, use different values for the primary index of the “dummy” column, to avoid an overload of a single AMP.

NEWA Case Study: Tactical Workload and Join Indexes

Recently, I was involved in an optimization project for a web-frontend tool with Teradata as the backend database.

During the development process, nobody had considered that the queries executed against the database are tactical workload. Most of the workload the tool generated was full table scans.

The web tool had been tested on a development system, but when it went into production, and dozens of people started to work in parallel, together with all the other workload, performance severely. The real problem was not the size of the involved tables, as all tables were quite small.

The real problem was a mix of end-user expectations when navigating in the web frontend, and the setup of the workload management, which did not classify the tool queries as tactical workload.

Users expected an average response time of just a few seconds, which could not be achieved with full table scans on a loaded system (I understood the users. Who wants to wait 30 seconds to save the information of just one client?)

Run times for the main activities were at best 1-2 seconds but increased to a couple of minutes on a busy system. Furthermore, workload delays added to the overall response times.

I began to add unique secondary indexes, but as the tables were tiny, the optimizer decided that a full table scan of the base table would be cheaper than the indexed access.

Finally, it turned out that the usage of join indexes was the solution. As the tool required different access paths to the data, I created several 1:1 join indexes per table, for each of the access paths, only altering the primary index of the join index.

When adding join indexes, we have to keep an eye on the impact of an update, insert and delete statement. In my case, it was no problem, as tables were small.

What I experienced was another issue: The huge number of join indexes caused a lot of deadlocks.

There are several ways to overcome this problem. We can, for example, serialize the user session activities, by using a “serialization” table. Each session has to get a write lock for each activity on this”serialization” table before it can apply any changes to the data.

In my project, I was able to minimize deadlocks by rewriting the critical pieces of code, reducing the number of changes (updates, deletes, inserts) done on the tables.

The last hint: Don’t forget to collect statistics on the join indexes. Otherwise, they may not be used!

See also:
Teradata Golden Tuning Tipps 2017 – Take your Skills to the next Level!
6 Golden Rules for Teradata SQL Tuning
The Teradata Join Index Guide – We leave no Questions unanswered!
Dramatically improve Performance with the Teradata Nested Join

Questions?
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.
Our Reader Score
[Total: 18    Average: 4.4/5]
Tactical Workload Tuning on Teradata written by Roland Wenzlofsky on February 11, 2017 average rating 4.4/5 - 18 user ratings

7 COMMENTS

  1. Sorry. There is small correction
    5.1. if you want to modify some rows and tables in one transaction then it is good to get most restrictive lock for participated rows in all tables involved into transaction at the most granular level(rowid is the best) at first the very first step. This prevents deadlocks.

  2. Hello.

    Thank you for good article. It easy to read and contains a helpful material.

    Let me say some suggestions:
    1. The point below does not concern the article theme but it is about query example in the text.
    The query
    SELECT CustomerGroupName, SUM(Revenue)
    FROM
    Sales t01
    INNER JOIN
    CustomerGroup t02
    ON
    t01.GroupCd = t02.GroupCd
    GROUP BY 1;
    looks to be bad, because there is a grouping by text field which should be avoided as much as possible in teradata. In short, each grouping operation requires building some auxiliary table in the memory. The more column in grouping clause and the wider these columns the more resources are required for building that auxiliary table to support grouping operation.
    If grouping column has variable text type, then teradata reserve maximum possible size for this type. For example if there is a grouping by CustomerGroupName column which is VARCHAR(400) UNICODE16 type then teradata reserves 800 bytes per each grouped row even then the actual value of the column is very small ( for example for “abc” is only 6 bytes). This leads to performance problems.
    It is usually easy to solve it,just by switching grouping column to ID

    SELECT MIN(CustomerGroupName) AS CustomerGroupName
    , SUM(Revenue)
    FROM Sales t01
    INNER JOIN CustomerGroup t02 ON t01.GroupCd = t02.GroupCd
    GROUP BY t01.GroupCd;

    2. It is excellent practice to use bind variables for tactical queries. This helps to reuse query plans and reduce count of internal locks.

    3. It is good to set up a lot of criteria and rules in TDWM for tactical queries to prevent moving tactical queries into non-tactical workload groups.

    4. For small batch operations( like updating 10 rows by PI) it is good to think about multistatement request or macros, they reduce context switching.

    5. There are some additional best practices for locking for tactical queries(described in Teradata request processing book), like this:
    5.1. if you want to modify some rows and tables in one transaction then it is good to get most restrictive lock for all rows in tables at first the first step. This prevents deadlocks.
    5.2.if an application may do a dirty read in some query(non tactical), let it make dirty read, and reduce count of lock.
    ….

    Thank you.

    Best regards,
    Aleksei Svitin.

    • Hi Aleksei,

      Thanks for sharing your ideas. Especially point 3 is very important in my experience. A few months ago I worked on an optimization project. It was mainly about tactical requests. I had the situation that TDWM was not setup correctly to tackle the tactical workload.

      All techniques I had applied would have been a huge improvement for these tactical queries, but workload management delayed them like non-tactical queries. This lead to the stupid situation, that my optimized queries, accessing single rows via a single table join index, had a runtime of about on second but a workload delay time of a few minutes.

  3. Another interesting technique for tuning ALL-AMP tactical Queries.

    Let consider a scenario where we have a web-portal accessing few small semantic aggregated/summarised tables and we have a tight SLA on response time. Most of the portal queries doing a all-amp retrieve steps from these tables and there are no opportunities for indexing or any sort of tuning.

    A technique here which can useful is by adding a deliberate skew by selecting highly non-unique primary index column/s. So in case 100 AMPs system, instead of data being evenly distributed on all 100 AMPs, using highly non-unique primary index column/s distribute the data only on 25 AMPs.

    In first case, rows are retrieved from all 100 AMPs reading large number data blocks.

    In second case, rows are retrieved only from 25 AMPs reading few number data blocks, hence reducing the number of data blocks to read.

    By adding deliberate skew for smalls tables, significant savings in response time can be achieved for All-AMP Tactical Queries.

    • Hi Siddesh,

      Thanks for sharing this technique with us. I assume by forcing all rows of a tiny lookup table to one AMP (all rows having the same Primary Index), there is another advantage: No deadlocks for rowhash locking can happen?

LEAVE A REPLY

Please enter your comment!
Please enter your name here