Introduction to Teradata Compression

Note: Teradata Block Level Compression is now permanently enabled and cannot be turned off. Nonetheless, this article remains useful for current Teradata systems utilizing block-level compression. It demonstrates the continued advantages of multi-value compression (Teradata MVC).

With Teradata’s introduction of block-level compression, the utilization of multi-value compression at various Teradata sites has decreased. At first, glance, disregarding the use of MVC appears logical. However, identifying the most appropriate prospects for MVC involves considerable exertion, which is frequently not reasonable in the current era where the storage cost is decreasing, but consultation is costly.

From the start, I want to clarify one thing: MVC and BLC serve distinct application purposes.

MVC aims to enhance the performance of queries by compressing data into blocks for processing. Additionally, it naturally reduces disk space requirements.

BLC does not aid in performance tuning. Teradata must immediately decompress data blocks upon transfer to main memory to enable processing. Therefore, BLC only conserves space and is best utilized for infrequently accessed data, assuming it can be deactivated on your particular system.

Interaction of BLC and MVC

We have four options: not using compression (unavailable on modern Teradata systems as BLC is always enabled), using only MVC (which also applies), using only BLC, or using both. We created a test environment below to measure the impact on resource consumption (such as CPU seconds for packing and unpacking and disk IOs) and space consumption.

In MVC, the columns and compression were carefully selected to ensure that nearly everything is compressed.

Our test setup comprises four tables, as depicted below.

SET QUERY_BAND = 'BLOCKCOMPRESSION=NO;' FOR SESSION;
CREATE SET TABLE DWHPRO.Customer_MVC
(
 CustomerId BIGINT NOT NULL,
 CategoryCd CHAR(100) COMPRESS ('PLATIN','GOLDEN','SILVER','BRONZE‘)
)
PRIMARY INDEX ( CustomerId );
SET QUERY_BAND = 'BLOCKCOMPRESSION=NO;' FOR SESSION;
CREATE SET TABLE DWHPRO.Customer_NONE
(
 CustomerId BIGINT NOT NULL,
 CategoryCd CHAR(100)
)
PRIMARY INDEX ( CustomerId );
SET QUERY_BAND = 'BLOCKCOMPRESSION=YES;' FOR SESSION;
CREATE SET TABLE DWHPRO.Customer_BLC_MVC
(
 CustomerId BIGINT NOT NULL,
 CategoryCd CHAR(100) COMPRESS ('PLATIN','GOLDEN','SILVER','BRONZE‘) 
)
PRIMARY INDEX ( CustomerId );
SET QUERY_BAND = 'BLOCKCOMPRESSION=YES;' FOR SESSION;
CREATE SET TABLE DWHPRO.Customer_BLC
(
 CustomerId BIGINT NOT NULL,
 CategoryCd CHAR(100)
)
PRIMARY INDEX ( CustomerId );

To be concise, we added 10 million rows to each table for greater significance. The CategoryCd column only includes four values: ‘PLATINUM’, ‘GOLD’, ‘SILVER’, and ‘BRONZE’.

Comparing the Disk IOs

The disk input/output operations result from performing a complete table scan without using indexes.

Teradata BLC
IOs needed for each combination of compression

By solely analyzing the disk IOs, we can deduce the following from the outcome:

  • Not using compression is always wrong.
  • We should combine block-level compression with MVC

Our test optimized the MVC setup to allow for the compression of nearly everything. However, in actual environments, such optimization may not be feasible.

Comparing the CPU Seconds

We will now investigate how varying compression techniques impact CPU consumption.

teradata MVC
Adding BLC to the optimized MVC does not benefit IOs but increases the used CPU seconds.

Here is the emerging picture:

  • Not using compression is always wrong.
  • Block-level compression increases the CPU seconds needed and should only be used for cold data.

Comparing the Space Usage

If we focus solely on permanent space savings, the following image arises:

  • Not using compression is always wrong.
  • MVC and BLC together have about the same space savings as BLC alone.
  • Even the highly optimized MVC setup we chose does not achieve space savings on its own as BLC alone or a combination of both.

Summary

Although the results may appear straightforward, it’s important to approach them with caution. Our scenario was designed to optimize MVC, which may not always be feasible in real-world situations. The benefits of implementing additional MVC are also limited if BLC is already enabled on most systems.

Additionally, there are few automated tools available for MVC, and the cost of hiring a consultant is no longer proportional to the price of mass storage. The days of Teradata consultants spending weeks on MVC to save costly disk space seem behind us.

It looks different here if you examine the whole thing from the CPU side. If we deal with a CPU-bound system, MVC can make sense to bring load off the machine.

Enabling BLC results in only a minor discrepancy in disk IOs, so there is no justification for using MVC.

I conclude that MVC should only be implemented if there is a high CPU usage issue. Therefore, we should prioritize our time on more productive tasks rather than searching for potential MVC candidates.

  • Brilliant article, Roland.
    Thanks for sharing the comparison metrics.
    Whenever I have a doubt, I just hop on to your blog. 🙂

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >