Collect Statistics in Teradata – Maintenance
1. Maintenance Milestones before you decide
Now that you have defined, collected, and evaluated all statistics necessary and helpful, it is time to design the maintenance steps.
First, what are the speed and variance of your Teradata data warehouse objects? What is the most relevant time unit of measurement in it?
Your world is between a smooth, gradual growth or decline over time and a mostly unpredictable earthquake-like makeover you are never told in advance.
Second, how often are you granted time windows for statistics maintenance, and how predictable, stable, and how many subjects to debate? How much precedence can you give it to other tasks?
The answers to these questions decide how advanced your maintenance task could become.
Remember that any of the following approaches are not due to a collect statement’s nature but the environmental constraints you have.
2. Aging and Recollection
As statistics can become outdated in principle, the question is when to refresh them?
The need to give Teradata an update on table statistics results either from healthy development or structural changes. The former is the typical quantitative table content that evolves over the ordinary course of business. The latter are business-driven table reconstructions such as the addition or dismissal of table attributes or a “regime change” in the data leading to new values and any physical data model changes such as repartitioning or redefinition of indexes.
Here are recollection approaches we use(d) ourselves or have heard of:
- Just-so recollection of the entire collect combination set every couple of days or when someone “feels the need”.
- The complete recollection of the whole collect combination is set every n days or weeks.
- A partial recollection of the whole collects combination during a calendar time window to continue on the next calendar time window until one round is done.
- Triggered partial recollection based on the collection date.
- Triggered partial recollection based on the relative table size change.
We do not recommend defining a fixed time or date schedule for entire recollections, as the mere passage of time is typically not the driving force behind what makes statistics outdated. Instead, a sudden table size change, the insertion or deletion of data layers containing new values for individual fields, or a table’s recreation due to physical data model changes are a call to action.
The triggered relative size approach is a (semi-)automated way that spreads the recollection over time so that smaller time windows – of which there are typically more – can be used. If done automatically, the most equal spreading possible can be achieved.
But what are the triggering criteria?
It is time to recollect when there is a considerable difference between the estimated query subset and step figures and the actual result values.
3. A Collection statement register
We recommend creating a record table that holds this information to avoid losing the last set of collect combinations.
CREATE MULTISET TABLE P_XYZ_A.DM_STATS_REGISTER
(
TableName CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
CollectCombination VARCHAR(250) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL
) PRIMARY INDEX ( TableName, CollectCombination );
The information in this table looks like this:
T1111_ENTITY1 L_ID
T1111_ENTITY1 PARTITION,L_ID
T1111_ENTITY1 L_ID,C_DT
T1111_ENTITY1 PARTITION
T1111_ENTITY1 C_DT
4. A triggered partial recollection specification:
Based on the register table content (see above), one could define views that return all statistics considered old or missing and collection statement templates for selective recollection.
The tricky part is to derive the current statistics collection state from the dbc tables.
Once you have the dbc input, you must decide when and what to collect or recollect.
For a recollection, when required and spread evenly across time, we use the following procedure:
Store the table size and date of that size in a separate register table as part of the daily activity.
Look at the dbc statistics information to decide the latest collection date of anyone statistic for the table.
Compare the current size of the table as of that latest collection date to the current size.
If that size difference is more than 10 percent, select the table and all actual statistics for recollection.
Key features of this recollection specification are:
- Only a part of the entire set of tables is ready for recollection at any time.
- A table reappears for recollection until it is done. As the specification will remind you, you do not have to remember what you did.
- According to their needs, tables are suggested with stable lookups never mentioned and space shooting stars up several times in a row.
There is a drawback, too:
If you recollect selectively and spontaneously, the chances are that outdated statistics become undetected for a while as the most recent recollection date is the base case.