Collect Statistics in Teradata – Maintenance
1. Maintenance Milestones before you decide
Now that you have defined, collected and evaluated all statistics necessary and helpful, it is time to design the maintenance steps.
First, what are the speed and variance of your Teradata data warehouse objects? What is the most relevant time unit of measurement in it?
Your world is anywhere between a smooth, gradual growth or decline over time and a mostly unpredictable earthquake-like makeover you are never told in advance.
Second, how often are you granted time windows for statistics maintenance and how predictable, stable and how many subjects to debate are they? How much precedence can you give it to other tasks?
The answers to these questions decide how advanced your maintenance task could become.
Keep in mind that any of the following approaches are not due to the nature of a collect statement itself, but due to the environmental constraints you have.
2. Aging and Recollection
As statistics can become outdated in principle, the question is when to refresh them?
The need to give Teradata an update on table statistics results either from healthy development or structural changes. The former is the typical quantitative table content evolve over the ordinary course of business. The latter are business-driven table reconstructions such as the addition or dismissal of table attributes or a “regime change” in the data leading to new kinds of values as well as any physical data model changes such as repartitioning or redefinition of indexes.
Here are recollection approaches we use(d) ourselves or have heard of:
- Just-so recollection of the entire collect combination set every couple of days or when someone “feels the need”.
- Complete recollection of the whole collect combination set every n days or weeks.
- Partial recollection of the whole collect combination during a calendar time window to continue on the next calendar time window until one round is done.
- Triggered partial recollection based on the collection date.
- Triggered partial recollection based on the relative table size change.
We do not recommend to define a fixed time or date schedule for entire recollections as the mere passage of time is typically not the driving force behind what makes statistics outdated. Rather, a sudden table size change, the insertion or deletion of data layers containing new values for individual fields or the recreation of a table due to physical data model changes are a call to action.
The triggered relative size approach is a (semi-)automated way that spreads the recollection over time so that smaller time windows – of which there are typically more – can be used. If done automatically, the most equal spreading possible can be achieved.
But what are triggering criteria?
It is time to recollect when there is a considerable difference between query subset and step figures estimated and the real result values.
3. A Collection statement register
In order not to lose what we have considered the last set of collect combinations, we recommend creating a record table that holds this information.
CREATE MULTISET TABLE P_XYZ_A.DM_STATS_REGISTER, NO FALLBACK,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
TableName CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
CollectCombination VARCHAR(250) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL)
PRIMARY INDEX ( TableName, CollectCombination );
The information in this table looks like this:
4. A triggered partial recollection specification:
Based on the register table content (see above), one could define views that return all statistics considered old or missing as well as collection statement templates for selective recollection.
The tricky part here is to derive the current statistics collection state out of the dbc tables.
Once you have the dbc input at hand, you need to decide when and what to collect or recollect.
For a recollection when it is required and spread evenly across time, we use the following procedure:
Store the table size and date of that size in a separate register table as part of the daily activity.
Look at the dbc statistics information to decide the latest collection date of any one statistic for the table.
Compare the current size of the table as of that latest collection date to the current size from now on.
If that size difference is more than 10 percent, select the table and all actual statistics for recollection.
Key features of this recollection specification are:
- Only a part of the entire set of tables is ready for recollection at any time
- A table reappears for recollection until it is done. You do not have to keep in mind what you did as the specification will remind you
- Tables are suggested according to their needs, with stable lookups never mentioned, and space shooting stars up several times in a row
There is a drawback, too:
If you recollect selectively and spontaneously, the chances are that outdated statistics become undetected for a while as the most recent recollection date is the base case.