First and foremost, Teradata SQL tuning requires us to find the cause of the performance issue. It is never the SQL statement as such which is bad, but one or more steps of the execution plan.
This article is not about real optimization techniques but shows you a practical approach for query optimization. My daily observation is that people tend to follow a trial and error approach, sometimes luckily ending up with a solution, but not knowing anymore what solved the problem.
We have to approach the issue like detectives, tracking down the wrong steps in the execution plan and finding out what causes them. Isolated inspection is the proper approach here.
All SQL performance problems turn out to be related to one of the two root causes in an execution step: Uneven distribution of rows (skew) or wasteful workload. Most of the times, missing or stale statistics are the trigger.
- Take a before snapshot of the explain of your SQL statement, allowing to notice any changes in the execution plan caused by your optimization activities.
- Check completeness and up-to-dateness of your statistics. Do this even before taking any closer look at the execution plan. Add missing statistics and update the old ones. Watch out for steps with low or no confidence, but don't assume that you can always achieve high confidence levels. Particular kind of conditions prevents high confidence (OR conditions for example).
- If statistics had to be refreshed or completed, run another explain and examine if the execution plan has changed. If the optimizer has a new execution plan, rerun your query. The new execution plan could have solved your performance issue.
- If your performance problem did not dissolve, it is time to take a closer look at each step of the execution plan.While you could start with a static analysis on the explain output, I prefer to start with real-time monitoring as it is most times the less time-consuming approach. Any real-time monitoring tool (like Viewpoint, PMON, DBMON) gives you the essential information needed.
- Execute your SQL statement and watch each step in real-time.
- Wasteful workload: Watch out for discrepancies between estimated rows and row returned in each step, as this will mislead the optimizer, causing it to choose the wrong data preparation (copy, rehash, etc.) and join strategies (product join instead of the merge join, etc.). If this is the case and all your statistics are fine, you need to start thinking about possible solutions like temporary tables, advanced indexing, partitioning, query rewriting, etc.
- Skewed data: If you detect a step in the execution plan which is skewed, your base tables have a well-distributing primary index and the statistics are excellent, a skewed intermediate spool probably causes the problem. Skewed spool happens for several reasons, like in the case of two joined tables moved to a common ROWHASH with only a few distinct values. There is no secret to solving such a problem. If the goal is to avoid the skewed spool, you should consider all optimization techniques which can help you, like breaking up queries into smaller parts, splitting the query into two parts, one handling the skewed values, the other handling the un-skewed values.
- Repeat the above approach until no wrong steps are left. Don't forget to set statistics after each impacting change on the SQL statement.
To summarize, optimization consists of the following steps:
Detect the wrong steps in the execution plan, figure out what is done wrong by the optimizer (bad join, etc.) and why it is wrong.
As soon as you know the reason, you can browse your optimization toolkit for the appropriate tool and make use of your creativity: Many roads lead to Rome.