Compression in Teradata Columnar offers two main benefits: Reduction in used permanent space and disk IOs.
When dealing with the compression of columns, there is usually a significant advantage over row compression: A limited data variance per column.
Teradata Columnar uses several compression methods which are exploiting this property.
Teradata Columnar & Run-Length Encoding
This compression method stores each column value exactly once but adds information about the consecutive rows which contain this value. Assuming that a column container for a date column contains the date ‘2015-10-29’ in the rows 100-200, run-length encoding would store this information like this:
‘2015-10-29’;100-200
Teradata Columnar & Dictionary Encoding
If the column values are not repeated successively, another compression method can be applied, the dictionary encoding. Dictionary encoding is done by storing compressed forms of the full value. The dictionary entries have a fixed length, making navigation easy. Teradata Columnar keeps one dictionary per container.
For example, the values for customers the segments “Business” and “Private” could be stored as dictionary entries 1 and 2:
1,”Business”,2,”Private”
The column values would be stored as 1,2,1,2,2,2,1,1,2,1,…
As you can see, this allows for reducing used disk space by mapping larger column values to smaller ones.
Teradata Columnar & Delta Compression
While run-length and dictionary encoding are straightforward, Teradata Columnar has another more sophisticated way of compression, the delta compression.
If the column values of a container are in a tight range, only the offset from an average container value will be stored.
Let’s assume the following column values:
10,20,50,100,20,10
Delta compression would store the encoded information like this (notice that the average column value is 35):
-25,-15,+15,+65,-15,-25
You will immediately think, “But what’s the advantage in used space if I replace the original values with the numbers below.” The answer is the data type. If our column container holds BIGINT values, the offsets may be stored in an SMALLINT value, saving 6 bytes per row.
Conclusion
The total effect of the compression methods available in Teradata Columnar is a vast improvement over the usual multivalue compression (MVC) used for row stored data.
Fortunately, Teradata decides on which compression algorithm will be used (but this behavior can be changed). The compression method can vary across all the table columns or even from container to container within a column. Even multiple methods can be used with each column at the same time.