What is the meaning of the Teradata DatablockSize option?
Most of us are familiar with the Teradata option, DATABLOCKSIZE, included in the CREATE TABLE statement. However, only a few know how to use it to reduce disk IOs. This article provides a brief overview of how to utilize this option effectively.
Teradata stores data in data blocks consisting of multiple sectors on a disk cylinder.
Normally, a data block solely contains rows from a particular table, with no intermixing of rows from different tables. As such, the block header of each data block includes a table id that establishes the table to which the block pertains.
Columnar storage was introduced in Teradata 14.10, allowing for the compression of multiple rows’ values into a single data block for one or more columns.
The DATABLOCKSIZE specifies the number of sectors composing a data block, each with a default size of 512 bytes. Teradata appends an additional 512-byte sector to the data block upon reaching full capacity, continuing until the maximum block size is achieved.
Teradata will divide a data block into two blocks when it reaches the maximum block size.
What does this mean for us? Storing the same number of rows can be achieved through a few large data blocks or multiple smaller ones. Inserting rows may lead to more frequent block splits in smaller data blocks.
The optimal data block size is contingent upon your workload specifications.
Tables accessed through full table scans should ideally have a larger data block size, as this allows for the efficient transfer of multiple rows into the memory of the AMPs.
Suppose your table has many distinct index accesses with only a few row retrievals. In that case, it is advisable to establish diminutive data blocks to prevent overcrowding in the AMP memory during data retrieval.