Whenever you have to analyze a vast Teradata table regarding Multivalue compression, this is a costly operation from a workload point of view.
Unfortunately, Teradata (the company) does not offer any tool that would automate this task.
Usually, analysis requires counting the number of distinct values per table column. Counting is resource and time intensive for huge tables.
Luckily, with Teradata 14.10, we have a less costly method available:
SHOW STATISTICS VALUES COLUMN <column> ON <table>;
The above statement outputs the most biased column values and many other useful statistics about a column. Usually, the biased values are great candidates for Teradata Multivalue compression.
This approach’s advantage is that no costly table analysis is required (compare this against the approach of counting the distinct column values for each column of a huge table). Nevertheless, it delivers excellent compression suggestions.
Using the number of NULL values per column should have a great starting point for a quick win regarding space reduction.
Nevertheless, as good as this method is for achieving quick results, it comes with some drawbacks:
- It only works for columns with collected and correct statistics.
- Unfortunately, the information used in the SHOW STATISTICS statement is taken from a binary object (“FieldStatistics”). There is no easy way to extract this information. As a substitute, we have to use the text output of the statement and parse it, to be able to automatize the process of creating compress statements.