Teradata Multivalue Compression – A Quick Approach
Unfortunately, Teradata (the company) does not offer any tool which would allow automatizing this task.
Usually, analysis requires counting the number of distinct values per table column. Counting is resource and time intense for huge tables.
Luckily, with Teradata 14.10 we have a less costly method available:
SHOW STATISTICS VALUES COLUMN <column> ON <table>;
Above statement outputs the most biased column values and many other useful statistics about a column. Usually, the biased values are great candidates for Teradata Multivalue compression.
The advantage of this approach is that no costly table analysis is required (compare this against the approach of counting the distinct column values for each column of a huge table), nevertheless it delivers astoundingly good compression suggestions.
By additionally using the number of NULL values per column, you should have a great starting point for a quick win regarding space reduction.
Nevertheless, as good as this method is for achieving quick results, it comes with some drawbacks:
- It only works for columns with collected and correct statistics.
- Unfortunately, the information used in the SHOW STATISTICS statement is taken from a binary object (“FieldStatistics”). There is no easy way to extract this information. As a substitute, we have to use the text output of the statement and parse it, to be able to automatize the process of creating compress statements.