Tactical workload tuning is a particular discipline on Teradata. Tactical workload requires a unique skillset and tools.
In this article, I will show you the skills needed to optimize the tactical workload.
You will learn how to use the proper tools and the “tactical workload skill set.”
The Difference between Tactical and Decision Support Workload
Decision support workload typically involves full table scans (FTS) and all-row joins -usually between big tables.
Indexing (NUSI, USI, etc.) can improve performance, but all table rows are accessed more often.
Here is an example of a decision support query:
SELECT CustomerGroupName, SUM(Revenue)
t01.GroupCd = t02.GroupCd
GROUP BY 1;
The tactical workload is characterized by single-row access or access to a limited number of rows. Here is an example of a tactical query:
WHERE SalesId = 100;
Optimized tactical queries need indexed access paths, such as UPI, NUPI, NUSI, USI, Join Indexes, or a single row partition. The primary goal is to have an execution plan with each step, only delivering one or a few rows.
Your query’s tactical selection criteria have to be applied as early as possible in the execution plan.
While this allows avoiding deadlocks entirely, serialization might not be useable in any case for performance reasons.
I used the solution to rewrite the most executed statements, increasing the database’s number of changes. By keeping the number of changes small, I was able to prevent deadlocks.
With a proper query design, tactical query execution is stable, and performance does not decrease if table cardinality is growing, while decision support queries usually become slower with a growing amount of data:
Tactical query performance does not improve with the number of Teradata nodes. Adding more nodes to your Teradata system will not impact the performance of your single tactical query. Nevertheless, adding nodes to your system allows more tactical queries to be executed at the same time.
The performance of decision support queries improves with the number of nodes. Assuming a perfect data distribution (no skewing), doubling the number of nodes can speed up your query by a factor of two (nevertheless, this is pure numbers. No system is working 100% in parallel and without any skewing effects).
The Tactical Workload Toolkit
Besides optimizing the retrieve steps with indexed access to single rows (USI, UPI, NUSI, Join Index, etc.), we have to optimize the joins. Join optimization means to minimize the number of AMPs involved in the join operations.
For example, the below query’s execution plan shows that the selected row is accessed from only one AMP (single-AMP join step):
SELECT * FROM Sales s INNER JOIN SalesDetails sd ON s.SalesId = sd.SalesId WHERE s.SalesId = 100; 1)
First, we do a single-AMP JOIN step from Indexing.t01 by way of the primary index “Sales.s.SalesId = 100”
with no residual conditions, which is joined to Sales.SalesDetails by way of the primary index “Sales.sd.SalesId = 100″
with a residual condition of (“Sales.SalesDetails.SalesId = 100”). Sales.Sales and Sales.SalesDetails
are joined using a merge join, with a join condition of (“Sales.Sales.SalesId = Sales.SalesDetails.SalesId”).
The result goes into Spool 1 (one-amp), which is built locally on that AMP.
The size of Spool 1 is estimated with low confidence to be 151 rows.
The above query will always perform identical (given the execution plan is not changing). We want to achieve this in tactical workload tuning: A stable direct access to the data.
While the above example was built around a merge join, the most suitable join type for the tactical workload is the Nested Join.
When designing your physical data model, you should consider a design that allows the usage of Nested Joins.
Business users often need to retrieve information based on the natural key and not the surrogate keys. It’s always a good idea to think about keeping the natural keys in your physical data model when dealing with the tactical workload!
Here is an example of a physical data model that allows the usage of Nested Joins.
CREATE TABLE Sales
ProductId INTEGER NOT NULL,
SalesId INTEGER NOT NULL
) PRIMARY INDEX (ProductId );
CREATE TABLE Products
ProductId INTEGER NOT NULL,
) PRIMARY INDEX (ProductId )
UNIQUE INDEX (ProductCode);
In our example above, the data is accessed via the natural key “ProductCode” and not the surrogate key (ProductId). Therefore, we use a USI to create a direct data access path via the natural key:
SELECT * FROM Products p INNER JOIN Sales s ON p.prod_id = s.prod_id WHERE ProductCode = ‚Private‘
1)First, we do a two-AMP JOIN step from Sales.Products by way of unique index # 4
“Sales.p.ProductCode = ‘Private’" with no residual conditions, which is joined to Indexing.s by
The way of unique index # 4 “Sales.s.ProductId = Sales.p.ProductId" with no residual conditions.
Sales.p and Sales.s are joined using a nested join, with a join condition of ("(1=1)").
The result goes into Spool 1 (one-amp), which is built locally on the AMPs.
As you can see, the tactical query is resolved with a Nested Join and the USI on column ProductCode.
Forcing all Rows to one AMP
Another exciting tool we have to execute tactical queries against little tables is to force all rows to one AMP by adding a column to the small table, populating it with just one value. This column has to be the primary index of the table.
Populating the primary index column with just one value will force all table rows to the same AMP, and if the table is sufficiently small, even into the same data block.
If everything works like expected, this results in a single-AMP retrieve step instead of an All-AMP scan.
If there are many concurrent tactical queries executed against the table simultaneously, the entire table will stay in memory, avoiding any disk drive access.
If you think about using this approach for several tables, use different values for the primary index of the “dummy” column to avoid an overload of a single AMP.
A Case Study: Tactical Workload and Join Indexes
I was recently involved in an optimization project for a web-frontend tool with Teradata as the back-end database.
During the development process, nobody had considered that the queries executed against the database are tactical workload. Most of the workload the tool generated was full table scans.
The web tool had been tested on a development system, but when it went into production, and dozens of people started to work in parallel, together with all the other workloads, performance severely. The real problem was not the size of the involved tables, as all tables were relatively small.
The real problem was a mix of end-user expectations when navigating in the web frontend and the workload management setup, which did not classify the tool queries as tactical workload.
Users expected an average response time of just a few seconds, which could not be achieved with full table scans on a loaded system (I understood the users. Who wants to wait 30 seconds to save just one client’s information?)
Run times for the main activities were at best 1-2 seconds but increased to a couple of minutes on a busy system. Furthermore, workload delays added to the overall response times.
I began to add unique secondary indexes, but as the tables were tiny, the optimizer decided that a full table scan of the base table would be cheaper than the indexed access.
Finally, it turned out that the usage of join indexes was the solution. As the tool required different access paths to the data, I created several 1:1 join indexes per table for each access path, only altering the join index’s primary index.
When adding join indexes, we have to keep an eye on the impact of an update, insert, and delete statement. In my case, it was no problem, as the tables were small.
What I experienced was another issue: The vast number of join indexes caused a lot of deadlocks.
There are several ways to overcome this problem. For example, we can serialize the user session activities by using a “serialization” table. Each session has to get a write lock for each activity on this “serialization” table before applying any changes to the data.
In my project, I was able to minimize deadlocks by rewriting the critical pieces of code, reducing the number of changes (updates, deletes, inserts) done on the tables.
The last hint: Don’t forget to collect statistics on the join indexes. Otherwise, they may not be used!
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.