fbpx

Multivalue compression and the indication of nullability are different matters, but with one common task: Collecting information about column values in every row. To keep space requirements low, Teradata stores information about nullability and multivalue compress in the so-called presence bytes.

The alert reader might recognize that the name “presence byte” suggests that it indicates NULL values. In earlier releases of Teradata, this naming convention was introduced as only the nullability information was stored, and multivalue compression did not exist.

Each row is always equipped with at least one presence byte. More presence bytes are added if needed. Only 7 bits of the first presence byte store information about nullability or multivalue compression, as the first bit, is occupied by the system (and it’s always set to 1).

How is Teradata storing Information about NULL Values?

For each nullable column, one bit per row is used to find NULL values. The bits being used to show nullability are always stored in the bits used by the multivalue compression feature (which means the bits are used from right to left).  Columns defined as NOT NULL, of course, don’t need the nullability indicator bit.

The logic of indicating NULL values is simple:

  • If the presence bit is set to 1, a NULL value is present in the column.
  • If the presence bit is set to 0, any other value is present in the column.

See below an example table containing eight rows. As TheCol is nullable, an indicator bit is used to show rows containing NULL values in TheCol (as mentioned earlier, the first bit is always set to 1 as the system utilizes it):

Compress3

We already said that the only 7 bits of the first presence byte could be used. We can cover seven nullable columns with the always existing presence byte. Each additional presence byte can store the null value information of 8 more nullable columns.

If we increase the number of nullable columns from 7 to 8, an extra presence byte is added. A result is that each row consumes one more byte of permanent disk space.
Keep the following limits in mind when designing your tables:

Up to 7   nullable columns -> The already existing presence byte is used.
Up to 15 nullable columns -> 1 presence byte is added (2 in total)
Up to 23 nullable columns -> 2 presence bytes are added (3 in total)

It’s recommended that columns are defined as NOT NULL when possible, to keep the table size as small as possible. Furthermore, unused bits of a presence byte can be used by the multivalue compression feature.

Teradata Multivalue Compression and the Presence Bytes

Presence bytes are also used in multivalue compression.  For each column, up to 255 different values can be compressed.

The original column values are kept in the table header, and each row will only hold the binary number, representing its corresponding column value. If we are storing 255 different values, 8 bits are needed (a total presence byte).

The algorithm will only occupy as many bits as needed to encode the original column value:

Different ValuesBits needed
 1 1
 3 2
 7 3
 15 4
 31 5
 63 6
 127 7
 255 8

Let’s do another example. The below table has eight rows.  If we compress 3 column values ‘A’,’B’,’C’, we need two presence bits: ‘A’ could be represented by the binary number 01,  ‘B’ by binary 10, ‘C’ by binary 11 (the presence bits are occupied from right to left):

Compress1

We extend our example by making TheCol additionally nullable. In this case, one presence bit is needed to show rows containing NULL values in column TheCol. The compression is unchanged, just shifted by one bit to the left, as nullability bits are always stored first:

Compress2
As soon as we compress at least one value of TheCol, NULL values are automatically compressed and without any more cost. The column’s nullability bit can be used for NULL values. No encoding and storing in the table header is required.

By using the same presence bytes for multivalue compression and nullability simultaneously, space consumption can be reduced. Nevertheless, when designing your tables, always think about the joint costs of compressed columns and nullability indicators and how one extra compress value or nullable column will influence the size of your table.

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • Jayant, that right most bit is used by the system and will always be populated as 1 as you can see in all other examples

  • Avatar
    Jayant Singh says:

    Good morning Sir,

    Very nice explanation of multivalue compression here. kudos to you. I am a beginner at Teradata, but I like to learn new things.

    One question, I had in the example of multivalue compression (with the not nullable condition). When we represent the value ‘A’ by 01 in the first row of the table ‘The table’. My question is why the presence byte is looking like 00001011. What is the fourth ‘1’ (right to left) signifying there?

    your response is greatly appreciated.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >