When Teradata compression becomes pressing
Have you noticed your data warehouse nearing capacity or a significant increase in Current Perm for specific tables during data loading?
Implement compression as a supplement to your space management strategy.
Compression is a table column addition that designates frequently occurring values as “given” to avoid taking up space with each instance in the row. The value is stored in a “higher order” within the table header definition.
Before exploring the opportunities, it is important to acknowledge the constraints:
- Not all table types can be compressed; volatile tables are exempt.
- Compression is not supported on index or partitioning columns.
- Data types such as VARCHAR or TIMESTAMP cannot be compressed depending on the Teradata Release you are working with.
- You can only add compression values while space remains available in the table header section.
Now, what actions are necessary and advisable?
Ideally, reliable data demographics or forecasts should already be available in tabular form. If not, conduct a row count of the attributes you wish to compress and sort them by descending order of row number.
Compression is effective for attributes with few values, particularly for constant flags or type code columns in large tables with millions of records. We advise against compressing IDs or columns with a unique value for each row. Columns with an even distribution of numerous distinct values do not typically yield significant compression results.
A budget approach for getting it done
Deep technical knowledge is not required for selecting columns, gathering documentation, and making recommendations. Use the following method as a quick guide to getting results efficiently.
Apply the following filters to all columns in every table of your database:
Technical filter1:
Eliminate columns that cannot be compressed due to their technical function, such as primary index or partition columns. This filtering is irreversible at this stage and can only be changed by modifying the physical data model.
Technical filter 2:
Filter out columns with incompatible data types for compression. This criterion is more lenient because data types can be modified if the initial design relied more on speculation than caution.
A global minimum benefit filter can be used to eliminate tables that fall below the current size and have no further growth expected. If small tables are not present in the database or if you prefer not to exclude them immediately, you can rank them based on non-technical criteria such as usage in reporting.
Now, we will assess the remaining table columns to determine the concentration of values in each. A column with a high concentration of only a few values is more suitable for compression than one with numerous values spread uniformly. To measure this concentration, we will use the Herfindahl-Index (HI), a standard market concentration measure. The higher the value of the HI for a column, the more potential there is for compression.
To store the numeric basis of our decision, we keep a record of the HI value in a register table like this:
— COLUMN DEMOGRAPHICS REGISTER
CREATE MULTISET TABLE P_1111_A.DMnn_COL_HI_INDEX ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
STATUS_DT DATE FORMAT 'YYYY-MM-DD',
TableName CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
ColumnName CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
HI_INDEX DECIMAL(38,18))
PRIMARY INDEX ( TableName, ColumnName );
We insert the HI values by first preparing the insert statement for all relevant tables that we will then execute:
— HHI INDEX CALCULATION PREPARATION
sel TABLENAME, columnname,
'INSERT INTO P_1111_A.DMnn_COL_HI_INDEX SELECT CURRENT_DATE, '''|| trim(tablename) ||''','''|| trim(columnName) || ''','|| trim(columnName) || ', SUM (CAST( (ANZAHL/GES)**2 AS DECIMAL(38,18)) ) AS HI_INDEX
FROM
( SELECT ' ||trim(columnName)|| ', CAST(COUNT(*) AS DECIMAL(38,18)) ANZAHL FROM ' || trim(databasename) ||'.'|| trim(tablename) ||' GROUP BY 1 ) x
JOIN
(SELECT CAST(COUNT(*) AS DECIMAL(38,18)) AS GES FROM ' || trim(databasename) ||'.'|| trim(tablename) || ' ) GES
ON 1=1; ' AS HHI
from dbc.columns
where databasename='P_1111_A'
and columnname NOT IN ('X1_ID','X2_ID','X3_ID','X4_ID','X5_DT','X6_DT','X7_DT')
and substr(TABLENAME,1,1) IN ('P','T')
and ColumnType NOT IN ('CV','TS','BO')
order by 1,2
;
Note that the query has a few constraints on the dbc column list to remove mere technical fields and exclude non-production tables and columns with non-compressible fields.
Per the table, we select those columns with an HI high enough to prepare an insert statement of compression value candidates. For documentation purposes, we create a second register table first:
— COLUMN COMPRESSION CANDIDATES
CREATE MULTISET TABLE P_1111_A.DMnn_COL_COMP_CAND ,
NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
STATUS_DT DATE FORMAT ‘YYYY-MM-DD’,
TableName CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
ColumnName CHAR(30) CHARACTER SET LATIN NOT CASESPECIFIC,
COL_VALUE CHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
VALUE_SHARE DECIMAL(18,2) )
PRIMARY INDEX ( TableName, ColumnName, COL_VALUE ); — FIND ALL COMPRESSION VALUE CANDIDATES
sel TABLENAME, columnname,
'INSERT INTO P_1111_A.DMnn_COL_COMP_CAND SELECT CURRENT_DATE, '''|| trim(tablename) ||''','|| trim(columnName) || ', CAST( (ANZAHL/GES)*100 AS DECIMAL(38,2)) AS VALUE_SHARE
FROM
( SELECT ' ||trim(columnName)|| ', CAST(COUNT(*) AS DECIMAL(38,18)) ANZAHL FROM ' || trim(databasename) ||'.'|| trim(tablename) ||' GROUP BY 1 ) x
JOIN
(SELECT CAST(COUNT(*) AS DECIMAL(38,18)) AS GES , CAST(COUNT(DISTINCT ' ||trim(columnName)|| ') AS DECIMAL(38,18)) AS DDVAL FROM ' || trim(databasename) ||'.'|| trim(tablename) || ' ) GES
ON 1=1
WHERE CAST( (ANZAHL/GES)*100 AS DECIMAL(38,2)) > CAST( (1/DDVAL)*100 AS DECIMAL(38,2))
ORDER BY 3 DESC
;' AS COMPRESSION_VALUES
from dbc.columns
where databasename='P_1111_A'
and columnname NOT IN ('X1_ID','X2_ID','X3_ID','X4_ID','X5_DT','X6_DT','X7_DT')
and substr(TABLENAME,1,1) IN ('P','T')
and ColumnType NOT IN ('CV','TS','BO')
order by 1,2
;
You can use both register tables to put together a compression recommendation.
Register tables facilitate the periodic insertion of updated demographic information using the STATUS_DT field to record demographic changes.
Before finalizing your deployment list, consider the long-term perspective of your compression project.
What is the frequency of your compression list updates?
How frequently do you anticipate it will be required?
What is the level of volatility of the column value set over time?
As long as they exist, Yes-No flags will hold either Yes or No, while daily account balances or order numbers are temporary.
Teradata cannot answer these questions for you. Instead, the answers should come from discussions with business representatives and management — or from your own investigation if those conversations are not possible.
A success story about Teradata compression to inspire you
A year ago, we reviewed numerous tables with little or no compression applied. The database had reached its capacity limit, and acquiring additional space of the required scale was not feasible. An initial examination of the tables’ contents showed that compression was a viable option.
The results of introducing compression systematically were very satisfying:
While
- No other changes than compression were implemented and
- working in a production environment with daily growth,
- Over 1 TB of space was saved in total,
- an average reduction of 25% and 76.2% on a table level.
The result gave us many months more for daily production!
It paved the way for other necessary activities, leading to even more data being inserted.
Related Services
⚡ Need Help Optimizing Your Data Platform?
We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.
Explore Our Services →📋 Considering a Move From Teradata?
Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.
Free Migration Assessment →