The Teradata DATABLOCKSIZE
What is the meaning of the Teradata DatablockSize option?
Certainly most of us have seen the Teradata DATABLOCKSIZE option in the CREATE TABLE statement, but I suppose only a few of us are familiar with the usage of this option. This article should give you a short overview how you can use it to minimize your disk IO's.
Teradata stores data rows (or column containers) in data blocks and each data block is composed of several sectors on a disk cylinder (for details about how disks are constructed take a look at http://en.wikipedia.org/wiki/Disk_cylinder )
Traditionally, each data block only store rows from the same table. Rows from different tables are never mixed up in one data block. For this reason each data block holds the table id in the block header which uniquely identifies the related table.
Since Teradata 14.10 and with the availibility of columnar storage, values of one or several columns of several rows are packed together into the same data block.
The DATABLOCKSIZE basically determines how many sectors piece together one data block. The default size for one sector is 512 bytes. Each time a sector is full, Teradata adds another 512 Bytes sector to the data block until the maximum block size is reached.
After a data block reached the maximum block size, Teradata will split the block into two blocks.
What does this all mean for us? Basically you can store the same amount of rows within a few big data blocks or many small ones. Smaller data blocks will result in more frequent block splits if rows are inserted.
Which data block size is the best depends on the requirements of your workload.
Tables which are accessed most of the time by full table scans should probably have a big data block size. This way, many rows can be moved efficiently into the AMPs memory at once.
On the other hand, if you have a lot of unique index accesses on your table which means only a few rows are accessed, it will be better to define small data blocks in order to avoid to clutter up the AMPs memory each time you retrieve the data.