Teradata Columnar Usage Guide For Tuners

Teradata Columnar compression provides two primary advantages: a decrease in permanent storage utilization and disk input/output operations.

There is typically an advantage over row compression when compressing columns due to limited data variance per column. Teradata Columnar employs various compression methods that leverage this property.

Table of Contents

Teradata Columnar & Run-Length Encoding

This compression technique stores each column value once and supplements it with data about the subsequent rows with the same value. For instance, if a date column contains ‘2015-10-29’ in rows 100-200, the run-length encoding method would record this information as follows:

‘2015-10-29’;100-200

Teradata Columnar & Dictionary Encoding

If column values are not consecutively repeated, dictionary encoding can be used for compression. This involves storing compressed versions of each unique value in a dictionary. The entries in the dictionary are of a fixed length, making navigation simple. In Teradata Columnar, each container has its own dictionary.

Values for customers in the “Business” and “Private” segments can be saved as dictionary entries 1 and 2.

1, “Business”,2, “Private”

The column values will be stored as a sequence of alternating 1’s and 2’s, for example, 1, 2, 1, 2, 2, 2, 1, 1, 2, 1, and so on.

This mapping decreases disk space utilization by reducing the size of larger column values.

Teradata Columnar & Delta Compression

Teradata Columnar utilizes delta compression, a more sophisticated compression technique than the straightforward run-length and dictionary encoding methods.

If the column values in a container have a narrow range, only the deviation from the average container value will be saved. Consider the following column values:

10,20,50,100,20,10

Delta compression stores encoded information as follows (with an average column value of 35):

-25,-15,+15,+65,-15,-25

You may wonder about the benefit of using less space by replacing the original values with smaller numbers. The advantage lies in the data type. If the column container stores BIGINT values, the offsets can be saved as SMALLINT values, which frees up 6 bytes per row.

Conclusion

Teradata Columnar’s compression methods greatly surpass the typical multivalue compression (MVC) utilized for row-based data.

Teradata determines the compression algorithm utilized, although this can be altered. Compression techniques can differ between table columns or containers within a column. Multiple approaches can even be employed simultaneously within each column.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

Teradata Columnar Compression Methods: Run-Length, Dictionary, and Delta Compression

Teradata Columnar & Run-Length Encoding

Teradata Columnar & Dictionary Encoding

Teradata Columnar & Delta Compression

Conclusion

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

Leave a Comment Cancel reply