BLC, Row Level Partitioning, and Datablock Size Tuning

In this article, we will learn how block-level compression, row-level partitioning, and selection of the appropriate data block size can be used to influence the Teradata table size and, thus, performance. We will create a test scenario based on a customer master table loaded daily. We will modify the DDL statement of the table several times to demonstrate the influence of BLC, row-level partitioning, and data block size. Customer master data is very efficient to compress because master data rarely changes.

CREATE MULTISET TABLE TestTable_PI ,FALLBACK ,NO BEFORE JOURNAL,NO AFTER JOURNAL,CHECKSUM = DEFAULT,DEFAULT MERGEBLOCKRATIO,MAP = TD_MAP1(COL1 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL2 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC, — Customer NumberCOL3 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,COL4 BIGINT,COL5 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL6 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL7 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL8 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL9 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL10 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL11 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL12 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL13 DATE FORMAT ‘YY/MM/DD’, — Reference DateCOL14 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL15 DECIMAL(18,8),COL16 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL17 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL18 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,COL19 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL20 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL21 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL22 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL23 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL24 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL25 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL26 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL27 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL28 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL29 DATE FORMAT ‘YY/MM/DD’,COL30 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,COL31 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,COL32 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,COL33 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,COL34 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC)PRIMARY INDEX ( COL2 );

SELECT TABLENAME,SUM(CURRENTPERM) / (1024**3) FROM DBC.TABLESIZEWHERE TABLENAME LIKE ‘%TESTTABLE%’GROUP BY 1;TableName (Sum(CurrentPerm)/(1024**3))TestTable_PI 1,41

As the above query shows, the table which is not Row Level Partitioned is about 1.41 gigabytes in size. We can assume that BLC compressed the table highly efficiently (see our explanations from before). Since the table is not row level partitioned, the rows of a customer are clustered by the primary index, in our case, the customer number, which is stored in column Col2, in one or a few data blocks.

Teradata Table size Tuning with Row Level Partitioning

In the following setup, we partition the same table by a reference date, which is stored in Column Col13:

CREATE MULTISET TABLE TestTable_PPI ,FALLBACK ,NO BEFORE JOURNAL,NO AFTER JOURNAL,CHECKSUM = DEFAULT,DEFAULT MERGEBLOCKRATIO,MAP = TD_MAP1(COL1 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL2 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC, — Customer NumberCOL3 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,COL4 BIGINT,COL5 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL6 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL7 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL8 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL9 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL10 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL11 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL12 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL13 DATE FORMAT ‘YY/MM/DD’, — Reference DateCOL14 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL15 DECIMAL(18,8),COL16 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL17 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL18 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,COL19 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL20 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL21 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL22 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL23 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL24 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL25 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL26 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL27 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL28 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL29 DATE FORMAT ‘YY/MM/DD’,COL30 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,COL31 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,COL32 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,COL33 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,COL34 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC)PRIMARY INDEX  ( COL2 ) PARTITION BY RANGE_N(COL13  BETWEEN DATE ‘2015-01-01’ ANDDATE ‘2030-12-31’ EACH INTERVAL ‘1’ DAY , NO RANGE, UNKNOWN);

INSERT INTO TestTable_PPISELECT * FROM TestTable_PI;TableName (Sum(CurrentPerm)/(1024**3))TestTable_PPI 10,67

The result is somewhat surprising in terms of scale. The table size has grown many times. It is now 10.67 gigabytes in size. To understand what happened, we have to remember how rows of partitioned tables are stored: In the first step, the rows of a customer are all set to the same AMP (since the customer number is the primary index). But then they are divided into different partitions according to the reference date. Therefore they are scattered over many data blocks, and the locality as in the previous example is no longer given. BLC can consequently no longer compress as well.

Teradata Table Size Tuning with Data Block Size

In the next test, we will see how the data block size can affect the size of a table:

CREATE MULTISET TABLE TestTable_PPI_SmallBlock ,FALLBACK ,NO BEFORE JOURNAL,NO AFTER JOURNAL,CHECKSUM = DEFAULT,DEFAULT MERGEBLOCKRATIO, DATABLOCKSIZE = 21504 ,MAP = TD_MAP1(COL1 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL2 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC, — Customer NumberCOL3 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,COL4 BIGINT,COL5 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL6 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL7 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL8 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL9 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL10 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL11 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL12 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL13 DATE FORMAT ‘YY/MM/DD’, — Reference DateCOL14 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL15 DECIMAL(18,8),COL16 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL17 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL18 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,COL19 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL20 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL21 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL22 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL23 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL24 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL25 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL26 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL27 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL28 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL29 DATE FORMAT ‘YY/MM/DD’,COL30 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,COL31 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,COL32 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,COL33 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,COL34 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC)PRIMARY INDEX  ( COL2 ) PARTITION BY RANGE_N(COL13  BETWEEN DATE ‘2015-01-01’ ANDDATE ‘2030-12-31’ EACH INTERVAL ‘1’ DAY , NO RANGE, UNKNOWN);

INSERT INTO TestTable_PPI_SmallBlockSELECT * FROM TestTable_PI;

TableName (Sum(CurrentPerm)/(1024**3))TestTable_PPI_SmallBlock 18,80

The table is now as large as 18.80 gigabytes. Building on our explanations from earlier, this also makes sense. The rows are now even scattered over a more significant number of data blocks, so BLC has become even more inefficient.

Further Tuning with Row Level Partitioning

Finally, let’s see how the number of partitions affects the table size. Finally, let’s see how the number of partitions affects the table size. We reduce the number of partitions by partitioning by months instead of days:

CREATE MULTISET TABLE TestTable_PPI_Less_Partitions ,FALLBACK ,NO BEFORE JOURNAL,NO AFTER JOURNAL,CHECKSUM = DEFAULT,DEFAULT MERGEBLOCKRATIO,DATABLOCKSIZE = 21504 ,MAP = TD_MAP1(COL1 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL2 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC, — Customer NumberCOL3 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,COL4 BIGINT,COL5 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL6 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL7 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL8 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL9 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL10 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL11 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL12 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL13 DATE FORMAT ‘YY/MM/DD’, — Reference DateCOL14 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL15 DECIMAL(18,8),COL16 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL17 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL18 VARCHAR(100) CHARACTER SET LATIN NOT CASESPECIFIC,COL19 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL20 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL21 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL22 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL23 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL24 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL25 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL26 VARCHAR(50) CHARACTER SET LATIN NOT CASESPECIFIC,COL27 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL28 CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC,COL29 DATE FORMAT ‘YY/MM/DD’,COL30 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,COL31 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,COL32 VARCHAR(32) CHARACTER SET UNICODE NOT CASESPECIFIC,COL33 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC,COL34 VARCHAR(100) CHARACTER SET UNICODE NOT CASESPECIFIC)PRIMARY INDEX ( COL2 ) PARTITION BY RANGE_N(COL13 BETWEEN DATE ‘2015-01-01’ ANDDATE ‘2030-12-31’ EACH INTERVAL ‘1’ MONTH, NO RANGE, UNKNOWN);

INSERT INTO TestTable_PPI_Less_PartitionsSELECT * FROM TestTable_PI;

TableName (Sum(CurrentPerm)/(1024**3))TestTable_PPI_Less_Partitions 4,42

The table size is now only 4.42 gigabytes. The reduction in space can be explained by the fact that more rows can now be accommodated in a data block (approx. 30 times more rows per partition). This, in turn, increases the efficiency of block-level compression.

Conclusions

As we have seen, we can control the size of tables very well using the parameters, number of partitions, and Data block size.

Which setup we need and when depends on the workload we have to serve. Small data blocks and detailed partitioning are advantageous for tactical workloads but, at the same time, cause larger tables if the data demography would allow efficient BLC otherwise.

A strategic workload characterized by full table scans can be well covered by large data blocks and fewer partitions.

Please note that we modeled our test table to show significant changes, and in reality, the achievable savings may be different. But it should have shown you what possibilities you have to model your tables according to the requirements.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>