Reduce An Already Compressed Teradata Table To 10% Of Its Size!

Reduce An Already Compressed Teradata Table To 10% Of Its Size!

Often we have already reduced the size of our tables immensely using Multivalue Compression and yet we need more space.

Shrinking A Teradata Table To A Minimum Size

I used once a simple trick for you that has allowed me to reduce a table already optimized with multivalue compression to 10% of its size.

This is a special scenario, but it is worth to keep an eye on such setups.

As an example I use the following table:

CREATE SET TABLE DWHPRO.Sales
(
      SalesDate DATE FORMAT 'YY/MM/DD' NOT NULL,
      Customer_Category_Cd CHAR(5) NOT NULL COMPRESS ('10000','20000'),
      SalesId CHAR(50) NOT NULL,
      Sales_Main_Type_Cd CHAR(1) NOT NULL COMPRESS ('A','B','C','D','E'),
      Sales_Sub_Type_Cd CHAR(5) NOT NULL COMPRESS '00001','00002','00003'),
      Sales_Value_Cd CHAR(5) NOT NULL COMPRESS ('VAL01','VAL02','VAL03'),
      Sales_Class_Cd CHAR(5) NOT NULL COMPRESS ('Cla01','Cla02'),
      Group_SalesId CHAR(50) NOT NULL COMPRESS (''),
      Currency CHAR(3) NOT NULL COMPRESS ('EUR','USD','AUD','NZD'),
      Final_Flag INTEGER NOT NULL COMPRESS 0 ,
      Price DECIMAL(38,2) COMPRESS (0.00, 100.00,200.00)
PRIMARY INDEX (SalesId)
PARTITION BY RANGE_N(
SalesDate BETWEEN DATE '2019-01-01' AND DATE '2020-12-31' EACH INTERVAL '1' DAY,NO RANGE, UNKNOWN)
UNIQUE INDEX (SalesDate,Customer_Category_Cd,SalesId,Sales_Main_Type_Cd ,Sales_Sub_Type_Cd,Sales_Value_Cd,Sales_Class_Cd ,Group_SalesId ,Currency ,Final_Flag );

As you can see SalesId is the primary index. This index distributes well and is also often used for joining.

The Primary Key was replicated via a USI.

The idea behind it is to prevent primary key duplicates from being inserted into the table.

This looks like a legitimate solution at first glance, but it turns out to be a fatal error. Why I will show you in a moment. But first, let's have a look at the size of the table:

SELECT SUM(CURRENTPERM)  / (1024**3)  AS Size_GB FROM DBC.TABLESIZE 
WHERE TABLENAME = 'Sales';
-> 125,50 GB

The next step is to check how big the table is when we drop the USI:

DROP UNIQUE INDEX (SalesDate,Customer_Category_Cd,SalesId ,Sales_Main_Type_Cd,Sales_Sub_Type_Cd,Sales_Value_Cd,
Sales_Class_Cd ,Group_SalesId ,Currency ,Final_Flag) 
ON DWHPRO.Sales; 
SELECT SUM(CURRENTPERM)  / (1024**3)  AS Size_GB FROM DBC.TABLESIZE 
WHERE TABLENAME = 'Sales';
-> 12,50 GB
teradata primary key
How can it be that the table is only a tenth of the size?

The Reason For A Huge Table Size

A USI is not compressed. Since the USI of our example includes almost all columns of the table, it is like an uncompressed copy of the base table.

Teradata Primary Key Check Solution

I always see it as a bad practice that secondary indexes are only created to force the primary key.

Most of the time it makes more sense to do this programmatically once in the ETL process. This also has the advantage that the maintenance of the secondary index is omitted which consumes additional resources.

DWHPRO Teradata Quiz
DWHPRO Teradata Quiz
Developer: Roland Wenzlofsky
Price: Free
Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 20 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

>