Implementing Teradata Compression: A Guide to Managing Data Warehouse Capacity and Saving Space

 

When Teradata compression becomes pressing

Have you noticed your data warehouse nearing capacity or a significant increase in Current Perm for specific tables during data loading?

Implement compression as a supplement to your space management strategy.

Compression is a table column addition that designates frequently occurring values as “given” to avoid taking up space with each instance in the row. The value is stored in a “higher order” within the table header definition.

Before exploring the opportunities, it is important to acknowledge the constraints:

  • Not all table types can be compressed; volatile tables are exempt.
  • Compression is not supported on index or partitioning columns.
  • Data types such as VARCHAR or TIMESTAMP cannot be compressed depending on the Teradata Release you are working with.
  • You can only add compression values while space remains available in the table header section.

Now, what actions are necessary and advisable?

Ideally, reliable data demographics or forecasts should already be available in tabular form. If not, conduct a row count of the attributes you wish to compress and sort them by descending order of row number.

Compression is effective for attributes with few values, particularly for constant flags or type code columns in large tables with millions of records. We advise against compressing IDs or columns with a unique value for each row. Columns with an even distribution of numerous distinct values do not typically yield significant compression results.

 

A budget approach for getting it done

Deep technical knowledge is not required for selecting columns, gathering documentation, and making recommendations. Use the following method as a quick guide to getting results efficiently.

Apply the following filters to all columns in every table of your database:

Technical filter1:

Eliminate columns that cannot be compressed due to their technical function, such as primary index or partition columns. This filtering is irreversible at this stage and can only be changed by modifying the physical data model.

Technical filter 2:

Filter out columns with incompatible data types for compression. This criterion is more lenient because data types can be modified if the initial design relied more on speculation than caution.

A global minimum benefit filter can be used to eliminate tables that fall below the current size and have no further growth expected. If small tables are not present in the database or if you prefer not to exclude them immediately, you can rank them based on non-technical criteria such as usage in reporting.

Now, we will assess the remaining table columns to determine the concentration of values in each. A column with a high concentration of only a few values is more suitable for compression than one with numerous values spread uniformly. To measure this concentration, we will use the Herfindahl-Index (HI), a standard market concentration measure. The higher the value of the HI for a column, the more potential there is for compression.

To store the numeric basis of our decision, we keep a record of the HI value in a register table like this:

— COLUMN DEMOGRAPHICS REGISTER

CREATE MULTISET TABLE P_1111_A.DMnn_COL_HI_INDEX ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
STATUS_DT DATE FORMAT 'YYYY-MM-DD',
TableName CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
ColumnName CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
HI_INDEX DECIMAL(38,18))
PRIMARY INDEX ( TableName, ColumnName );

We insert the HI values by first preparing the insert statement for all relevant tables that we will then execute:

— HHI INDEX CALCULATION PREPARATION

sel TABLENAME, columnname,
'INSERT INTO P_1111_A.DMnn_COL_HI_INDEX SELECT CURRENT_DATE, '''|| trim(tablename) ||''','''|| trim(columnName) || ''','|| trim(columnName) || ',  SUM (CAST( (ANZAHL/GES)**2 AS DECIMAL(38,18)) ) AS HI_INDEX
FROM
( SELECT ' ||trim(columnName)|| ', CAST(COUNT(*)  AS DECIMAL(38,18))  ANZAHL FROM ' || trim(databasename) ||'.'|| trim(tablename) ||'  GROUP BY 1 ) x
JOIN
(SELECT CAST(COUNT(*)  AS DECIMAL(38,18)) AS GES FROM ' || trim(databasename) ||'.'|| trim(tablename) || ' ) GES
ON 1=1; '  AS HHI
from dbc.columns
where databasename='P_1111_A'
and columnname NOT IN ('X1_ID','X2_ID','X3_ID','X4_ID','X5_DT','X6_DT','X7_DT')
and substr(TABLENAME,1,1) IN ('P','T')
and ColumnType NOT IN ('CV','TS','BO')
order by 1,2

;

Note that the query has a few constraints on the dbc column list to remove mere technical fields and exclude non-production tables and columns with non-compressible fields.

Per the table, we select those columns with an HI high enough to prepare an insert statement of compression value candidates. For documentation purposes, we create a second register table first:

— COLUMN COMPRESSION CANDIDATES

CREATE MULTISET TABLE P_1111_A.DMnn_COL_COMP_CAND ,
NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
STATUS_DT DATE FORMAT ‘YYYY-MM-DD’,
TableName CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
ColumnName CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
COL_VALUE CHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
VALUE_SHARE   DECIMAL(18,2)  )
PRIMARY INDEX ( TableName, ColumnName, COL_VALUE ); — FIND ALL COMPRESSION VALUE CANDIDATES

sel TABLENAME, columnname,
'INSERT INTO P_1111_A.DMnn_COL_COMP_CAND  SELECT CURRENT_DATE, '''|| trim(tablename) ||''','|| trim(columnName) || ', CAST( (ANZAHL/GES)*100 AS DECIMAL(38,2)) AS VALUE_SHARE
FROM
( SELECT ' ||trim(columnName)|| ', CAST(COUNT(*)  AS DECIMAL(38,18))  ANZAHL FROM ' || trim(databasename) ||'.'|| trim(tablename) ||'  GROUP BY 1 ) x
JOIN
(SELECT  CAST(COUNT(*)  AS DECIMAL(38,18)) AS GES , CAST(COUNT(DISTINCT ' ||trim(columnName)|| ')  AS DECIMAL(38,18)) AS DDVAL FROM ' || trim(databasename) ||'.'|| trim(tablename) || ' ) GES
ON 1=1
WHERE CAST( (ANZAHL/GES)*100 AS DECIMAL(38,2)) > CAST( (1/DDVAL)*100 AS DECIMAL(38,2))
ORDER BY 3 DESC
;'   AS COMPRESSION_VALUES
from dbc.columns
where databasename='P_1111_A'
and columnname NOT IN ('X1_ID','X2_ID','X3_ID','X4_ID','X5_DT','X6_DT','X7_DT')
and substr(TABLENAME,1,1) IN ('P','T')
and ColumnType NOT IN ('CV','TS','BO')
order by 1,2
;

You can use both register tables to put together a compression recommendation.

Register tables facilitate the periodic insertion of updated demographic information using the STATUS_DT field to record demographic changes.

Before finalizing your deployment list, consider the long-term perspective of your compression project.

What is the frequency of your compression list updates?

How frequently do you anticipate it will be required?

What is the level of volatility of the column value set over time?

As long as they exist, Yes-No flags will hold either Yes or No, while daily account balances or order numbers are temporary.

Teradata cannot answer these questions for you. Instead, the answers should come from discussions with business representatives and management — or from your own investigation if those conversations are not possible.

 

A success story about Teradata compression to inspire you

A year ago, we reviewed numerous tables with little or no compression applied. The database had reached its capacity limit, and acquiring additional space of the required scale was not feasible. An initial examination of the tables’ contents showed that compression was a viable option.

The results of introducing compression systematically were very satisfying:

While

  • No other changes than compression were implemented and
  • working in a production environment with daily growth,
  • Over 1 TB of space was saved in total,
  • an average reduction of 25% and 76.2% on a table level.

The result gave us many months more for daily production!

It paved the way for other necessary activities, leading to even more data being inserted.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

📊 Data Platform Migration Survey

Help us map where the industry is heading. Results are public — see what others chose.

1. What is your current data platform?

2. Where are you migrating to (or evaluating)?

Migrating FROM
Migrating TO

Thanks for voting! Share this with your network.

Follow me on LinkedIn for daily insights on data warehousing and platform migrations.

Stay Ahead in Data Warehousing

Get expert insights on Teradata, Snowflake, BigQuery, Databricks, Microsoft Fabric, and modern data architecture — delivered to your inbox.

Leave a Comment

DWHPro

Expert network for enterprise data platforms. Senior consultants, project teams built for your challenge — across Teradata, Snowflake, Databricks, and more.

📍Vienna, Austria & Jacksonville, Florida

Quick Links
Services Team Teradata Book Blog Contact Us
Connect
LinkedIn → [email protected]
Newsletter

Join 4,000+ data professionals.
Weekly insights on Teradata, Snowflake & data architecture.