First and foremost, Teradata SQL tuning requires us to find the cause of the performance issue. It is never the SQL statement, which is terrible, but one or more steps of the execution plan.
This article is not about real optimization techniques but shows you a practical approach for query optimization. My daily observation is that people tend to follow a trial and error approach, sometimes luckily ending up with a solution but not knowing what solved the problem anymore.
We have to approach the issue like detectives, tracking down the wrong steps in the execution plan and finding out what causes them. Isolated inspection is the proper approach here.
All SQL performance problems turn out to be related to one of the two root causes in an execution step: Uneven distribution of rows (skew) or wasteful workload. Most of the time, missing or stale statistics are the trigger.
- Take a snapshot of your SQL statement’s explain plan before, allowing you to notice any changes in the execution plan caused by your optimization activities.
- Check the completeness and up-to-dateness of your statistics. Do this even before taking any closer look at the execution plan. Add missing statistics and update the old ones. Watch out for steps with low or no confidence, but don’t assume that you can consistently achieve high confidence levels. Particular kind of conditions prevents high confidence (OR conditions, for example).
- Suppose statistics had to be refreshed or completed, run another explanation, and examine if the execution plan changed. If the optimizer has a new execution plan, rerun your query. The new execution plan could have solved your performance issue.
- If your performance problem did not dissolve, it is time to look at each step of the execution plan. While you could start with static analysis on the explain output, I prefer to start with real-time monitoring as it is usually the less time-consuming approach. Any real-time monitoring tool (like Viewpoint, PMON, DBMON) gives you the essential information needed.
- Execute your SQL statement and watch each step in real-time.
- Wasteful workload: Watch out for discrepancies between estimated rows and rows returned in each step. This will mislead the optimizer, causing it to choose the wrong data preparation (copy, rehash, etc.) and join strategies (product join instead of the merge join). If this is the case and all your statistics are fine, you need to start thinking about possible solutions like temporary tables, advanced indexing, partitioning, query rewriting, etc.
- Skewed data: If you detect a step in the execution plan, which is skewed, your base tables have a well-distributing primary index, and the statistics are excellent. A skewed intermediate spool probably causes the problem. Skewed spool happens for several reasons, as two joined tables moved to a common ROWHASH with only a few distinct values. There is no secret to solving such a problem. Suppose the goal is to avoid the skewed spool. In that case, you should consider all optimization techniques that can help you, like breaking up queries into smaller parts and splitting the query into two parts, one handling the skewed values and the other handling the un-skewed values.
- Repeat the above approach until no wrong steps are left. Don’t forget to set statistics after each impacting change on the SQL statement.
To summarize, optimization consists of the following steps:
Detect the wrong steps in the execution plan, figure out what is done wrong by the optimizer (bad join, etc.) and why it is wrong.
As soon as you know the reason, you can browse your optimization toolkit for the appropriate tool and use your creativity: Many roads lead to Rome.