fbpx

In this article, we will show you how Block-Level Compression, Row-level partitioning, and the selected data block size of a table can affect performance and table size. For this demonstration, we need a test table on which we will make various changes to these settings. Our test table includes customer data and a cut-off date. The table is loaded with customer master data on a daily basis. By its nature, it is therefore very highly compressible, since customer master data rarely changes.

CREATE MULTISET TABLE TestTable_PI ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      COL1 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL2 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC, -- Customer Number
      COL3 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL4 BIGINT,
      COL5 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL6 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL7 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL8 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL9 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL10 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL11 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL12 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL13 DATE FORMAT 'YY/MM/DD', -- Reference Date
      COL14 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL15 DECIMAL(18,8),
      COL16 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL17 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL18 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL19 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL20 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL21 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL22 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL23 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL24 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL25 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL26 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL27 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL28 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL29 DATE FORMAT 'YY/MM/DD',
      COL30 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL31 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL32 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL33 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL34 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC)
PRIMARY INDEX  ( COL2 );
SELECT TABLENAME,SUM(CURRENTPERM) / (1024**3) FROM DBC.TABLESIZE
WHERE TABLENAME LIKE '%TESTTABLE%'
GROUP BY 1
;

TableName         (Sum(CurrentPerm)/(1024**3))
TestTable_PI    1,41

The table above is not partitioned and occupies 1.4GB. We can assume that the data is highly block-level compressed since most customer master data will not change daily. Since the table is not row-level partitioned, the rows of a customer are packed together into data blocks.

Next, we partition the table by the reference date and check how this impacts table size:

CREATE MULTISET TABLE TestTable_PPI ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     MAP = TD_MAP1
     (
      COL1 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL2 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC, -- Customer Number
      COL3 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL4 BIGINT,
      COL5 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL6 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL7 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL8 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL9 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL10 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL11 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL12 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL13 DATE FORMAT 'YY/MM/DD', -- Reference Date
      COL14 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL15 DECIMAL(18,8),
      COL16 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL17 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL18 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL19 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL20 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL21 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL22 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL23 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL24 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL25 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL26 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL27 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL28 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL29 DATE FORMAT 'YY/MM/DD',
      COL30 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL31 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL32 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL33 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL34 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC)
PRIMARY INDEX  ( COL2 ) PARTITION BY RANGE_N(COL13  BETWEEN DATE '2015-01-01' AND 
 DATE '2030-12-31' EACH INTERVAL '1' DAY ,  NO RANGE, UNKNOWN);
INSERT INTO TestTable_PPI
SELECT * FROM TestTable_PI; 
 
TableName                     (Sum(CurrentPerm)/(1024**3))
TestTable_PPI                 10,67

As we can see, the table is now over 10GB in size! Why is that? Well, let’s remember how rows are stored for partitioned tables: First, they are hashed to the respective AMP, then sorted into the partitions. I.e. the rows of the same client are now distributed across many partitions. Due to the missing locality of the data, block-level compression cannot work as well.

In another experiment we will now reduce the data block size of the table to a minimum:

CREATE MULTISET TABLE TestTable_PPI_SmallBlock ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     DATABLOCKSIZE = 21504 ,
     MAP = TD_MAP1
     (
      COL1 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL2 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC, -- Customer Number
      COL3 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL4 BIGINT,
      COL5 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL6 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL7 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL8 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL9 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL10 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL11 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL12 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL13 DATE FORMAT 'YY/MM/DD', -- Reference Date
      COL14 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL15 DECIMAL(18,8),
      COL16 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL17 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL18 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL19 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL20 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL21 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL22 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL23 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL24 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL25 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL26 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL27 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL28 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL29 DATE FORMAT 'YY/MM/DD',
      COL30 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL31 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL32 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL33 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL34 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC)
PRIMARY INDEX  ( COL2 ) PARTITION BY RANGE_N(COL13  BETWEEN DATE '2015-01-01' AND 
 DATE '2030-12-31' EACH INTERVAL '1' DAY ,  NO RANGE, UNKNOWN);
INSERT INTO TestTable_PPI_SmallBlock
SELECT * FROM TestTable_PI; 
TableName         (Sum(CurrentPerm)/(1024**3))

TestTable_PPI_SmallBlock           18,80

The table is now even 18GB in size! The reason is that the rows have to be distributed over a bigger number of data blocks. As the name Block-level Compression says, data blocks are the unit of compression. The more data blocks are needed, the worse is the compression rate.

Finally, we can control the size of the table by the number of partitions:

CREATE MULTISET TABLE TestTable_PPI_Less_Partitions ,FALLBACK ,
     NO BEFORE JOURNAL,
     NO AFTER JOURNAL,
     CHECKSUM = DEFAULT,
     DEFAULT MERGEBLOCKRATIO,
     DATABLOCKSIZE = 21504 ,
     MAP = TD_MAP1
     (
      COL1 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL2 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC, -- Customer Number
      COL3 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL4 BIGINT,
      COL5 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL6 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL7 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL8 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL9 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL10 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL11 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL12 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL13 DATE FORMAT 'YY/MM/DD', -- Reference Date
      COL14 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL15 DECIMAL(18,8),
      COL16 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL17 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL18 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL19 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL20 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL21 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL22 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL23 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL24 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL25 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL26 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL27 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL28 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,
      COL29 DATE FORMAT 'YY/MM/DD',
      COL30 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL31 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL32 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL33 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,
      COL34 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC)
PRIMARY INDEX  ( COL2 ) PARTITION BY RANGE_N(COL13  BETWEEN DATE '2015-01-01' AND 
 DATE '2030-12-31' EACH INTERVAL '1' MONTH,  NO RANGE, UNKNOWN);

We reduce the number of partitions by partitioning not by days but by months:

INSERT INTO TestTable_PPI_Less_Partitions
SELECT * FROM TestTable_PI; 
TableName                              (Sum(CurrentPerm)/(1024**3))
TestTable_PPI_Less_Partitions           4,42

Now an average of 30 days can be packed into one partition. This means that the rows are again closer together in data blocks, and block-level compression works better.

What do we learn from this? As we have seen, by controlling these parameters we can reduce or increase the table size depending on the requirements, which of course always has an effect on the performance of queries.

Small data blocks and many partitions are helpful for tactical workloads and querying fewer rows, but tend to cause larger tables for well compressible data.

Large data blocks and few partitions are useful for large scans, and take up less space for well compressible tables.

Buy now at Amazon
{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>