The Teradata DBMS has always supported compression in a (growing) number of forms.
Version 5.0 introduced Multi-Value Compression (MVC), version 13.10 introduced Algorithmic (ALC) and Block Level (BLC) compression, and version 14.0 introduced multiple automatic compression options for Columnar tables.
These can produce disk space and associated performance savings; however, implementing Teradata compression with MVC or ALC takes man-time, which has a cost.
It also needs to be remembered that because data demographics change over time, analysis of either of these types of compression needs to be revisited regularly (which may only be once a year).
Yes, some tools will help with this, but the point of this article is to ask if the time you spend on compression is worthwhile? Is it giving you benefits that are commensurate with the costs? Like many things, this is a “cost vs. benefit” question.
- They have about 98000 tables.
- About 26000 tables have MVC implemented.
- On their system, only 4000 tables are big enough that a full table scan is estimated to take longer than 1 second – even allowing for a reduced scan rate because of concurrent queries.
I am not recommending that you remove existing MVC compression on small tables. Using MVC doesn’t impose any CPU overhead, and eliminating compression on many tables will be a lot of effort for negligible gain. If you have ALC applied to a small table, then you might get some savings.
When designing new tables or possibly when making changes to existing tables, I would recommend that the compression setting be carefully considered.
If that table is likely minor, I would consider not applying any compression. This is more process change than any remedial action that needs to be taken.
I am not sure if anyone has come across this issue and if there might be a simple solution. In the Teradata site, I am working at, they have not been able to use MVC as it impacts on the ability of Netbackup to do deduplication.
Currently, they run with Tara and a Netbackup plugin and they achieve a fair amount of deduplication in the backups. I have been advised that when they attempt to add MVC to the Teradata tables, the deduplication fails on Netbackup and so they have not implemented it.
I believe that they can save space, and a system upgrade by implementing MVC, but it would appear at a cost of SAN storage.
Hi Nathan,
Thanks for your comments. I think your comments actually re-enforce the point that I’m trying to make.
You’re absolutely right that a lot of the time small tables get joined to big tables, but precisely because they are small, compression won’t give you much benefit. If a table starts with a size of 1MB you can only reduce the size by 1MB, and if on a given system a 1MB table can be scanned in under 1 second then you can only save 1-second processing on a query.
Given that (without using any tools) this analysis takes a lot of time and effort – which has a cost – are the savings worth the cost? That is really the point of my article.
Cheers,
Dave
David,
One thing to consider with small table compression, is that small tables are (very) often joined to big tables, resulting in large intermediate result sets in the spool. Multi-Value Compression is preserved in the spool, so performing MVC on columns in small tables will give you benefits in reducing spool size, with the associated performance gains from reduced IO.
I certainly agree with your advice, the choice of compression should be carefully considered. It’s worth exploring the degree of benefit from aspects of “most used” as well as size. Ideally perform a benchmark with some real data and queries, to provide the facts and figures to make an informed decision.
@Roland: In my personal opinion Teradata doesn’t make a compression utility because our software partners, such as Ward Analytics and Atanasoft, make excellent tools for analyzing and implementing compression. If you have a look at the Teradata developer exchange (https://developer.teradata.com/), you can find articles discussing how to do compression analysis manually, which you can easily adapt and automate for your own use.
Regards,
Nathan Green.
Senior Ecosystem Architect
Teradata Pty Ltd.
Hi David. The problem is that Teradata (the vendor) is not offering any tool for compression (for me it’s obvious why). Doing compression manually is very costly but still a frequently applied strategy. As compression is an optimization problem over a punch of variables I think the best way to go is the usage of 3rd party tools.