Tag Archive

Tag Archives for " optimization "
4

What are the Goals of SQL Tuning?

The goal of Teradata SQL tuning is to cut resource usage. Two measures we have to attend:

  • Disk IOs
  • CPU seconds

These are absolute values. They are not influenced by any concurrent system activity and stay stable for a particular execution plan.

Don’t use execution times as your optimization target. Many irrelevant factors will affect run times:

Session blocking, workload delays, heavy concurrent workload, etc.

The most expensive task for any RDBMS is to move data from the mass storage devices to memory.

Many of the techniques & ideas described below reduce the number of transferred data blocks (from disk to memory and vice versa). Some of them help to cut CPU consumption.

9 Teradata Tuning Tips to increase performance by at least 50%

1. Ensure Completeness and Correctness of Teradata Statistics

The most important optimization task is to aid the Teradata Optimizer with complete and correct statistics.

We have to pay particular attention to 3 basic situations and always collect full statistics:

  1. Non-indexed columns used in predicates:

    Missing statistics force the Optimizer to do a very inaccurate heuristic estimation.
  2. Skewed indexed columns:

    Random AMP sampling on skewed index results in wrong estimates and execution plans.
  3. Small tables:

    Estimations for small tables suck when there are fewer table rows than AMPs.  In this case,  most AMPs will not contain rows. A random-AMP sample taken from an AMP without or just a few rows will give a wrong estimation.

By following these three rules, many problems related to missing statistics will vanish.

The goal of our Teradata SQL tuning activities is to find the right balance between good query plans and the time needed to make sure useful statistical information.

Discovering  Missing Statistics

The most simple way to discover missing statistics is by turning on diagnostics:

DIAGNOSTIC HELPSTATS ON FOR SESSION;

The above statement adds statistics recommendation  at the end of each explained statement:

EXPLAIN
SELECT * FROM  WHERE Column = 'value';

Test each recommendation separately. Don’t add all recommendations at once, but wisely chose the ones which improve query performance.

Ensure that no plan step only has “no confidence”.  Steps with “no confidence” are a sure sign for heuristics on non-indexed columns. Something must avoid at all costs!

Detection of Stale Statistics

The Optimizer is doing an excellent job of detecting stale statistics and extrapolating.

Before release 14.10, Teradata detected table growth by comparing two random AMP samples. One taken during statistics collection, the other during query execution.

Since Teradata Release 14.10, deleted and inserted rows can be tracked (the UDI counts). For extrapolation, the Optimizer will prefer UDI counts over random AMP sample comparison.

Still, the best way to ensure up to date statistics is to collect them often.

If we need to find stale statistics, we have to compare estimations against the real row count. There are two places where we can use this information:

  • The Execution Plan (by placing the EXPLAIN modifier in front of our query)
  • The Statistics Histogram (by executing the SHOW STATISTICS VALUES ON <TABLE> statement)

The statistics histograms give us the same information the Optimizer has. It’s used to create the execution plan:

  • The timestamp of the last statistics collection
  • Collected & Random-AMP Sampling  row counts
  • Inserted and deleted rows since the last statistics collection
  • Maximum column values
  • Minimum column values
  • Biased column values
  • Break up the SQL into smaller pieces (using volatile tables)
  • Get rid of joins on expressions

Why did I say “almost the same information”? Statistics improve each time the Optimizer gets new insight. They can come from earlier join steps, single table predicates, and aggregations.

Furthermore, different methods of extrapolation will adjust estimations shown in the histograms. Furthermore, various methods of extrapolation will correct estimates shown in the histograms.

If you need the estimations after extrapolation you can use the following statement:

SHOW CURRENT STATISTICS VALUES ON <Table>;

Here is a “single table predicate” example. It demonstrates the usage of derived estimations:

SELECT * FROM <Table1> t01 INNER JOIn <Table2> t02 ON t01.Key = t02.Key WHERE t01.Column IN (1,2,3);

The retrieve step estimation for <Table1> is 3 distinct values. This information flows into the joining step. The retrieve step estimation for <Table1> is 3 distinct values. This information flows into the joining step. If the average number of rows per value for both tables is about 1 (unique data), the resulting spool for the join is three rows.  

A straightforward approach to detect stale statistics is this:

  • Decompose the query  into single retrieve and join steps
  • Figure out why step estimations and real numbers don’t match
    Statistics could be outdated, or they could be missing. It might even be that the estimation is a result of the way the optimizer works.
    Here is an example:
SELECT * FROM  WHERE Column = 1;

Value 1 is not available in <Table>As value=1 is not available in <Column> the Optimizer will estimate the result set to be the average rows per value. I guess many of you expect zero rows?

More information is available here: Teradata Statistics Basics

My last piece of advice:

Keep a copy of both the execution plans – before and after the change – to see the impact of your change.

2. The Primary Index Choice

When it comes to Primary Index choice, we must create a fair balance, weighing opposing requirements against each other:

Even data distribution and join performance.

Two tables can only be joined if they have the same primary index. In other words: The rows of both tables need to be on the same AMP.

Design your queries in a way, that the Primary Index is used for joining as much as possible, as this is the cheapest way of joining.

AMP-local joining also is possible if the join condition includes columns that are not in the primary index. But if the join condition is not including all primary index columns, the rows of one or both tables have to be relocated (and maybe sorted).

If you need a different primary index to improve performance, you can use volatile tables or temporary tables. Create them with the same structure and content as the original tables but with the needed primary index. The use of temporary tables is particularly useful if your query consists of many join steps.

3. Teradata Indexing & Partitioning

Indexes give the Optimizer more data access paths. They are improving highly selective retrieve actions. Unfortunately, indexes consume permanent disk space, and they require maintenance when the underlying base table changes.

Recommendation:

If indexes are not used by the Optimizer, and not useful in the PDM design, drop them immediately. They only will waste space and resources.

Unique secondary index (USI) allows for direct row access (like the primary index). The non-unique secondary index (NUSI) requires a full index subtable scan on all AMPs.

The NUSI can be an advantage over the base table access if the index subtable is smaller than the base table. Covering NUSI’s are more useful than non-covering ones:  There are base table lookups needed, and no costs created.

The difference between an index and partitioning is that indexes are sub-tables. Partitioning is another way of structuring the base table. Partitioning allows the Optimizer to limit access to the data blocks of a partition. The advantage of partitioning is that partition elimination always is used. Index usage has preconditions. For example, the NUSI will not be used without statistics collected on the index columns.

Still, there is a significant advantage of indexing: We create the index; we check if it’s used. We drop it if it’s not used. Have you ever partitioned a 200 Terabyte table for test reasons? I guess this is not what we like to do.

Another disadvantage of partitioning comes to my mind:

Join performance worsens if partitions of both tables don’t match or one of the tables is not partitioned.

Whenever working with partitioning, you have to keep the data warehouse architecture in mind. Decide if your solution fits into it. Partitioned tables use different join techniques. If tables have different partitions, this has an adverse impact on join performance!

Conclusion: There is no one-size-fits-all method. You have to check what works best for you!

4. Query Rewriting

Often queries performance improves when the query is rewritten. Here are some ideas:

  • DISTINCT instead of GROUP BY depending on the number of different values
  • UNION ALL instead of UNION (getting rid of a sort step)
  • Splitting a skewed query into two parts. One for handling the skewed values, the other to deal with the rest
  • Adding WHERE conditions to allow for partition elimination
  • Removing of unreferenced columns and expression from the select list (could help to cut joins)
  • Converting outer joins into inner joins

Query rewriting allows improving performance, sometimes even when all other techniques fail.

Query rewriting is a very powerful way to improve performance. Still, it often requires understanding the business logic (“can I replace this left join with an inner join?”).

It is possible to rewrite a query in a purely technical way. Still,  understanding the business logic of a query reveals more tuning opportunities.

See also our article on query rewriting here: Teradata Tuning – Query Rewriting

5. Physical Data Model Design

I was not sure if I should add the physical data model to this list for one reason. Often we can’t do any significant changes in the physical database design. Too many consumers on top of the database (such as reporting tools, data marts) would require a redesign.

Still, if we can improve the physical model, this is one of the most productive changes.

The best advice I can give:

Keep your core model normalized, denormalize in your data marts. Many of you will not agree. I can live with this. My experience is that early denormalization causes bad performance, especially when done without apparent reason. Check all columns of all tables if the information stored cannot be further broken down. Ensure also that they have the same data types and character sets.    Columns which are containing more than one piece of information force the user to use expression joins. Most likely the Optimizer will not use any statistics and the primary index for joining.

Ensure that all primary key columns are defined as NOT NULL. Add default values where appropriate. If you store codes, there is no reason to use UNICODE. It will just waste space.

Apply Multi Value compression on all tables: More rows can fit into each data block. Data blocks are the smallest unit transferred between disk and memory. More rows per data block lead to less disk IOs and better performance.

Please consider that the above advice can only be a loose collection of ideas about how to fix a broken data model.

6. Make Use of Teradata-Specific Features

There are several unique optimization opportunities which you should consider:

  • Using MULTISET tables can decrease Disk IOs
  • Use Multi-Statement-Requests, as this avoids the usage of the transient journal and does block optimization
  • Use CLOB columns instead of VARCHAR if you seldom select the column. Teradata stores CLOBs in sub-tables
  • DELETE instead of DROP tables
  • Use ALTER TABLE instead of INSERT…SELECT into a copy
  • User MERGE INTO instead of INSERT & UPDATE

7. Real-Time Monitoring

Observing a query while it’s running allows detecting the critical steps. Most people use Viewpoint for real-time monitoring. I prefer another tool called dbmon. The author is a guy from Teradata Austria (I hate the slowness of Viewpoint).

Bad SQL performance is either caused by:

  • Skewed steps or
  • Stale and missing statistics. They fool the Optimizer into creating wrong join decisions: Doing product joins instead of merge joins, duplicating instead of rehashing

That’s my way of real-time monitoring:

I wait for a step that is either skewed or in which the estimated input rows to a join don’t make sense. (such as having two spools with millions of rows joined with a product join). I concentrate my optimization on the steps of the previously mentioned type.

If the skew is the issue, I will analyze the join column value skew. If estimations are wrong, I will go over the statistics and make sure that they are up to date.

Issues with statistics can be fixed quickly. Skew issues can be quite stubborn. Query rewriting is always my last option unless I find something foolish and easy to repair.

8. Comparison of Resource Usage

Always measure resource usage before and after the optimization. As I said earlier: query run times are no reliable test!

Here is a SQL query you can use in your daily work to extract appropriate steps from the query log. You have to set a different QUERYBAND for each query version you are running to be able to distinguish them. You need “select” access to “DBC.DBQLOGTBL”.

SET QUERY_BAND = 'Version=1;' FOR SESSION;
SELECT
   AMPCPUTIME,
   (FIRSTRESPTIME-STARTTIME DAY(2) TO SECOND(6)) RUNTIME,
   SPOOLUSAGE/1024**3 AS SPOOL_IN_GB,
   CAST(100-((AMPCPUTIME/(HASHAMP()+1))*100/NULLIFZERO(MAXAMPCPUTIME)) AS INTEGER) AS CPU_SKEW,
   MAXAMPCPUTIME*(HASHAMP()+1) AS CPU_IMPACT,
   AMPCPUTIME*1000/NULLIFZERO(TOTALIOCOUNT) AS LHR
FROM
   DBC.DBQLOGTBL
WHERE
     QUERYBAND = 'Version=1;'

The query will return:

  • The total CPU Usage
  • The Spool Space needed
  • The LHR (ratio between CPU and IO usage)
  • The CPU Skew
  • The Skew Impact on the CPU

The goal is to cut total CPU usage, consumed spool space and skew.

9. Tactical Workload

Tactical workload requires a very particular skillset. The best tuner will fail if he doesn’t recognize that he is dealing with a tactical workload. I have seen complete projects failing because developers ignored this important fact.

The skillset required for the strategic workload is unique.

I strongly recommend to read this post which explains all the details:

Tactical Workload Tuning on Teradata

1

Teradata Statistics Case Study

The Optimizer usually is doing a good job when it comes to the usage of statistics. Nevertheless, sometimes it pays off to take a closer look at the execution plan and the Optimizer’s cardinality estimations.

Since Teradata 14.10 I got used to including the SHOW STATISTICS statement into my considerations, as the output metrics can help a lot to discover statistic problems.

Teradata Statistics – Avoid the Heuristics

The following case study shows how the Optimizer can be convinced (or forced) to use collected statistics instead of applying inaccurate heuristics.

The case study is based on the following two tables:

SHOW TABLE TheDatabase.TableOneDate;

CREATE SET TABLE TheDatabase.TableOneDate
(
OneDate DATE FORMAT ‘YYYY-MM-DD’
)  UNIQUE PRIMARY INDEX ( OneDate );

SELECT COUNT(*) FROM TheDatabase.TableOneDate;
–> 1 Row

SHOW TABLE TheDatabase.TheCalendar

CREATE MULTISET TABLE TheDatabase.TheCalendar
(
CalendarDate DATE FORMAT ‘yyyy-mm-dd’
PRIMARY INDEX ( CalendarDate );

SELECT COUNT(*) FROM TheDatabase.TheCalendar;

–> 36.889 Rows

I removed all columns of both tables, which are not relevant for this example and collected statistics on the remaining columns:

COLLECT STATS ON TheDatabase.TableOneDate COLUMN OneDate;
COLLECT STATS ON TheDatabase.TheCalendar COLUMN CalendarDate;


The following query is used in several hundreds of bad performing reports, and therefore caught my attention (it’s not the same query, but you should get the idea):

Explain SELECT * FROM TheDatabase.TheCalendar
WHERE CalendarDate >=
(
SELECT
OneDate
FROM TheDatabase.TableOneDate
)

1) First, we lock a distinct TheDatabase.”pseudo table”
for read on a RowHash to prevent global deadlock for
TheDatabase.TableOneDate.
2) Next, we lock a distinct TheDatabase.”pseudo table” for
read on a RowHash to prevent global deadlock for
TheDatabase.TheCalendar.
3) We lock TheDatabase.TableOneDate for read, and we
lock TheDatabase.TheCalendar for read.
4) We do an all-AMPs RETRIEVE step from
TheDatabase.TableOneDate by way of an all-rows
scan with no residual conditions into Spool 1 (all_amps), which is
built locally on the AMPs. The size of Spool 1 is estimated with
high confidence to be 1 row (25 bytes). The estimated time for
this step is 0.03 seconds.
5) We do an all-AMPs DISPATCHER RETRIEVE step from Spool 1 (Last Use)
by way of an all-rows scan and send the rows back to the
Dispatcher. The size is estimated with high confidence to be 1
row. The estimated time for this step is 0.01 seconds.
6) We do an all-AMPs RETRIEVE step from
TheDatabase.TheCalendar by way of an all-rows scan with a
condition of (“TheDatabase.TheCalendar.CalendarDate >= :%SSQ20”)
into Spool 2 (group_amps), which is built locally on the AMPs.
The size of Spool 2 is estimated with no confidence to be 12,297
rows (4,943,394 bytes). The estimated time for this step is 0.04
seconds.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 2 are sent back to the user as the result of
statement 1. The total estimated time is 0.08 seconds.

Although above execution plan doesn’t look that bad, there was something I did not like about it: The estimated number of rows in step 6, namely 12,297, was much too little. The result set in reality has 32,482 rows.

The reason is that the Optimizer is not using the available statistics (we collected above), but applies heuristics (with “no confidence”):

12,297 (estimation) / 36,889 (table cardinality) = 33%

You can easily prove this with a similar setup: The optimizer will always estimate the number of rows in return set to be 33% of the total table rows.

Below you can see the statistic histograms for related statistics on CalendarDate:

COLLECT STATISTICS
COLUMN ( CalendarDate )
ON TheDatabase.TheCalendar
VALUES
(
/** SummaryInfo **/
/* NumOfBiasedValues */ 0,
/* NumOfEHIntervals */ 249,
/* NumOfHistoryRecords */ 7,
/* MinVal */ DATE ‘2000-01-02’,
/* MaxVal */ DATE ‘2100-12-31’,
/* ModeVal */ DATE ‘2000-01-02’,
/* HighModeFreq */ 1,
/* NumOfDistinctVals */ 36889,
/* NumOfRows */ 36889,
/** Interval: MaxVal, ModeVal, ModeFreq, LowFreq, OtherVals, OtherRows **/
/* 1 */ DATE ‘2000-10-04’, DATE ‘2000-01-02’, 1, 1, 276, 276,
..
/* 249 */ DATE ‘2100-12-31’, DATE ‘2100-09-08’, 1, 1, 114, 114,

While it seems strange that the optimizer is not using available statistics, I had to accept this fact. I changed the query, replacing the subquery with a date literal (2012-01-26 is the same date returned in the subquery of our previous query):

Explain SELECT * FROM TheDatabase.TheCalendar
WHERE CalendarDate >=
(
SELECT
DATE’2012-01-26′
)

1) First, we lock a distinct TheDatabase.”pseudo table”
for read on a RowHash to prevent global deadlock for
TheDatabase.TheCalendar.
2) Next, we lock TheDatabase.TheCalendar for read.
3) We do an all-AMPs RETRIEVE step from
TheDatabase.TheCalendar by way of an all-rows scan with a
condition of (“TheDatabase.TheCalendar.CalendarDate >= DATE
‘2012-01-26′”) into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with high
confidence to be 32,482 rows (1,006,942 bytes). The estimated
time for this step is 0.03 seconds.
4) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.03 seconds.

The new execution plan shows that the Optimizer is using the collected statistics, and correctly estimates the result set rows. Unfortunately, using a literal is not an option for my report optimization task.

To get a better understanding how the Optimizer is designed, I again changed the query, selecting the literal from a “dummy table”:

Explain SELECT * FROM TheDatabase.TheCalendar
WHERE DAY_DATE >=
(
SELECT DATE’2012-01-26′ FROM
(
SELECT 1 AS x
) x
)

1) First, we lock a distinct TheDatabase.”pseudo table”
for read on a RowHash to prevent global deadlock for
TheDatabase.TheCalendar.
2) Next, we lock TheDatabase.TheCalendar for read.
3) We do an INSERT into Spool 2.
4) We do an all-AMPs RETRIEVE step from Spool 2 (Last Use) by way of
an all-rows scan into Spool 1 (used to materialize view, derived
table, table function or table operator x) (group_amps), which is
built locally on the AMPs. The size of Spool 1 is estimated with
high confidence to be 1 row (22 bytes). The estimated time for
this step is 0.01 seconds.
5) We do a group-AMP RETRIEVE step from Spool 1 (Last Use) by way of
an all-rows scan into Spool 3 (group_amps), which is built locally
on that AMP. The size of Spool 3 is estimated with high
confidence to be 1 row (25 bytes). The estimated time for this
step is 0.01 seconds.
6) We do a group-AMP DISPATCHER RETRIEVE step from Spool 3 (Last Use)
by way of an all-rows scan and send the rows back to the
Dispatcher. The size is estimated with high confidence to be 1
row. The estimated time for this step is 0.01 seconds.
7) We do an all-AMPs RETRIEVE step from
TheDatabase.TheCalendar by way of an all-rows scan with a
condition of (“TheDatabase.TheCalendar.DAY_DATE >= :%SSQ20”)
into Spool 4 (group_amps), which is built locally on the AMPs.
The size of Spool 4 is estimated with no confidence to be 12,297
rows (381,207 bytes). The estimated time for this step is 0.03
seconds.
8) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 4 are sent back to the user as the result of
statement 1.

The execution plan shows that we are back to the heuristics. Not what I want. Actually, by optimizing the “calendar view part” of all my reports I expected an improvement for all of them.

Finally, I came up with the following query, doing a correlated subquery:

Explain SELECT * FROM TheDatabase.TheCalendar t01
WHERE EXISTS
(
SELECT
*
FROM TheDatabase.TableOneDate t02
WHERE t01.DAY_DATE>=OneDate
)

1) First, we lock a distinct TheDatabase.”pseudo table”
for read on a RowHash to prevent global deadlock for
TheDatabase.t01.
2) Next, we lock a distinct TheDatabase.”pseudo table” for
read on a RowHash to prevent global deadlock for
TheDatabase.t02.
3) We lock TheDatabase.t01 for read, and we lock
TheDatabase.t02 for read.
4) We do an all-AMPs SUM step to aggregate from
TheDatabase.t02 by way of an all-rows scan with no
residual conditions. Aggregate Intermediate Results are computed
globally, then placed in Spool 3.
5) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 2 (all_amps), which is duplicated on
all AMPs. The size of Spool 2 is estimated with high confidence
to be 72 rows (1,224 bytes).
6) We do an all-AMPs JOIN step from TheDatabase.t01 by way
of an all-rows scan with no residual conditions, which is joined
to Spool 2 (Last Use) by way of an all-rows scan.
TheDatabase.t01 and Spool 2 are joined using an
inclusion product join, with a join condition of (
“TheDatabase.t01.DAY_DATE >= OneDate”). The
result goes into Spool 1 (group_amps), which is built locally on
the AMPs. The size of Spool 1 is estimated with low confidence to be
32,482 rows (1,006,942 bytes). The estimated time for this
step is 3.77 seconds.
7) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1.

The execution plan changed completely, but this plan allowed the Optimizer to use the statistics. The estimations match exactly the number of rows in the result set.

Although the new execution plan involves a product join, because of the small amount of data, this is not a big deal. More important is:  The estimations are excellent now. As the “calendar view part” is cross joined in each report several times, having the correct estimates is crucial.

I hope this case study motivates you to dive deeper into the optimizer’s details. As mentioned above, especially the new “SHOW STATISTICS” command has a lot of useful information.

Teradata Performance Optimization – Part 3 (Fixing the broken Data Model)

Cost-cutting, unfortunately, leads to the unpleasant situation that many clients save on the wrong side, starting Data Warehouse projects where they should finish them.

As a Teradata Performance Optimization Analyst, you probably will be confronted with scenarios where there is no maintained data model at all, unclear business specifications, unyielding and incomplete mappings.  You can consider yourself lucky for every piece of information you can squeeze out of your customer’s working environment.

In my experience, most of the time you are not in a position being  responsible for performance. This task is unfortunately often deemed as being an exclusively technical task.

However, although several performance problems can be fixed providing  purely technical expertise, you will probably end up with a lot of workarounds not correcting the root causes of your performance problems.

Despite such a purely technical approach may be applied for a very long time, adding up the costs at the end of the day will leave you in shock! Unfortunately this head-in-the-sand politics is daily routine in many companies.

As a performance specialist, you have to be business analysts, data modeller and developer at once. I think this is a very important insight: Performance Optimisation often evolves into  fixing the broken data model.

It is your task to make good for the failures in all these areas of expertise in the past. Try to contact people involved or responsible for these roles in the past at this client.

Sad to say, often you will be confronted with uncooperative behaviour as people tend to defend their petty areas of expertise. At the end of the day, you are questioning their work results from the past.  You are on a very delicate mission. Strong management support would probably  make your life easier.

Fixing a poorly designed data model is like a time travel. You have to go back to the start of the project, get to know the original business requirements, question how they have been transformed into the existing data model and why. You have to evolve a behaviour of questioning all past decisions done.

Most times it was for lack of budget, a wrong assignment of people to roles, lack of time or simply missing experience that turned the project into a big mess and a failed state. Still, in my opinion, the most outstanding cause is over-specialization of project members. Over-Spezialisation leads to evaporation of responsibilities. Everybody is shifting problems back and forth, lots of resources are wasted finding the responsible person until a problem finally is solved.

One approach I prefer,  if I  know that only a redesign of the data model can definitely solve performance issues, is to create a small prototype, demonstrating improvements.

I would take an assessable subject area and redesign the chain. Using such a prototype as a communication tool can make the difference between getting the chance of fixing the performance problem or just getting an answer like “would be nice, but we don’t have the budget”. The more tangible your approach, the better.

As always, success has to be measurable. Many times, reporting will be at the end of the chain and making some reports performing better on top of your prototype would be a good starting point for showing your expertise in Teradata performance optimization.

I hope the main messages from this article is clear:

These days, being just a highly specialist developer is not enough. As an performance specialist you have to understand the data warehouse life cycle.

>