Teradata Columnar compression provides two primary advantages: a decrease in permanent storage utilization and disk input/output operations.
There is typically an advantage over row compression when compressing columns due to limited data variance per column. Teradata Columnar employs various compression methods that leverage this property.
Teradata Columnar & Run-Length Encoding
This compression technique once stores each column value and supplements it with data about the subsequent rows with the same value. For instance, if a date column contains ‘2015-10-29’ in rows 100-200, the run-length encoding method would record this information as follows:
Teradata Columnar & Dictionary Encoding
If column values are not consecutively repeated, dictionary encoding can be used for compression. This involves storing compressed versions of each unique value in a dictionary. The entries in the dictionary are of a fixed length, making navigation simple. In Teradata Columnar, each container has its own dictionary.
Values for customers in the “Business” and “Private” segments can be saved as dictionary entries 1 and 2.
1, “Business”,2, “Private”
The column values will be stored as a sequence of alternating 1’s and 2’s, for example, 1, 2, 1, 2, 2, 2, 1, 1, 2, 1, and so on.
This mapping decreases disk space utilization by reducing the size of larger column values.
Teradata Columnar & Delta Compression
Teradata Columnar utilizes delta compression, a more sophisticated compression technique than the straightforward run-length and dictionary encoding methods.
If the column values in a container have a narrow range, only the deviation from the average container value will be saved. Consider the subsequent column values:
Delta compression stores encoded information as follows (with an average column value of 35):
You may wonder about the benefit of using less space by replacing the original values with smaller numbers. The advantage lies in the data type. If the column container stores BIGINT values, the offsets can be saved as SMALLINT values, which frees up 6 bytes per row.
Teradata Columnar’s compression methods greatly surpass the typical multivalue compression (MVC) utilized for row-based data.
Teradata determines the compression algorithm utilized, although this can be altered. Compression techniques can differ between table columns or containers within a column. Multiple approaches can even be employed simultaneously within each column.