Collect Statistics in Teradata – Maintenance
1. Maintenance Milestones before you decide
Having defined, collected, and evaluated all pertinent statistics, it is now time to devise maintenance procedures.
What are the velocity and fluctuation of your Teradata data warehouse entities, and which time unit is most pertinent for their measurement?
Your existence is in a state of either gradual progression or sudden and unforeseeable upheavals, akin to earthquakes without warning.
How frequently are you allocated time slots for statistical maintenance? How consistent and reliable are these slots, and how many topics require discussion? What level of priority does this task hold in comparison to others?
The answers to these inquiries determine your maintenance task’s complexity level.
Remember that any of the following approaches are not due to a collect statement’s nature but your environmental constraints.
2. Aging and Recollection
When should statistics be updated, as they can become outdated?
Updating Teradata table statistics is necessary either due to regular quantitative changes in table content or significant structural modifications. The former occurs during typical business operations, whereas the latter involves business-driven table reconstructions, such as adding or removing table attributes or changes in data leading to new values. Physical data model changes may also be required, such as repartitioning or index redefinition.
These are the memory techniques we have either personally employed or come across:
- Just-so recollection of the entire collect combination set every couple of days or when someone “feels the need”.
- The complete recollection of the whole collect combination is set every n days or weeks.
- A partial recollection of the whole collection combination during a calendar time window to continue on the next calendar time window until one round is done.
- Triggered partial recollection based on the collection date.
- Triggered partial recollection based on the relative table size change.
We advise against creating a rigid timetable for data refreshes since statistics become outdated not solely due to the passage of time. Rather, modifications such as alterations in table size, insertion or deletion of data layers that contain new values for specific fields, or recreation of tables caused by physical data model changes must prompt immediate action.
The triggered relative size method is a (semi-)automated technique that distributes recall over time, using smaller time windows, of which there are typically more. When executed automatically, it achieves the evenest distribution possible.
What criteria trigger this?
It is time to reflect on instances of a significant disparity between estimated query subsets and step figures and the factual outcome.
3. A Collection statement register
We suggest establishing a record table to prevent any loss of the most recent set of collected combinations.
CREATE MULTISET TABLE P_XYZ_A.DM_STATS_REGISTER
(
TableName CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
CollectCombination VARCHAR(250) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL
) PRIMARY INDEX ( TableName, CollectCombination );
The table displays the information in the following manner:
T1111_ENTITY1 L_ID
T1111_ENTITY1 PARTITION,L_ID
T1111_ENTITY1 L_ID,C_DT
T1111_ENTITY1 PARTITION
T1111_ENTITY1 C_DT
4. A triggered partial recollection specification:
By examining the content of the register table, one can create views that display statistics that are either outdated or absent. Additionally, collection statement templates can be generated for targeted recollection.
Deriving the current statistics collection state from the dbc tables can be challenging.
After obtaining the dbc input, it is necessary to determine the timing and content of the collection or retrieval process.
To facilitate recollection, it is necessary to utilize the following procedure: distribute the required information evenly over time.
Record a table’s dimensions and the corresponding date in a distinct registry as a component of the daily operations.
Refer to the dbc statistics to determine the most recent data collection date for any statistic in the table.
Compare the table size on the latest collection date to its current size.
Select the table and all actual statistics for recollection if the size difference exceeds 10%.
The essential elements of this specification for recollection are:
- Only a part of the entire set of tables is ready for recollection at any time.
- A table reappears for recollection until it is done. As the specification will remind you, you do not have to remember what you did.
- According to their needs, tables are suggested with stable lookups never mentioned and space shooting stars up several times in a row.
However, there is a downside.
Selective and spontaneous recollection may result in outdated statistics going unnoticed, as the most recent recollection becomes the primary reference point.