What is the meaning of the Teradata DatablockSize option?
Indeed most of us have seen the Teradata DATABLOCKSIZE option in the CREATE TABLE statement, but I suppose only a few of us are familiar with the usage of this option. This article should give you a short overview of how you can use it to minimize your disk IOs.
Teradata stores data rows (or column containers) in data blocks and each data block is composed of several sectors on a disk cylinder (for details about how disks are constructed, take a look at http://en.wikipedia.org/wiki/Disk_cylinder )
Traditionally, each data block only stores rows from the same table. Rows from different tables are never mixed up in one data block. For this reason, each data block holds the table id in the block header, uniquely identifying the related table.
Since Teradata 14.10 and the availability of columnar storage, values of one or several columns of several rows are packed together into the same data block.
The DATABLOCKSIZE determines how many sectors piece together one data block. The default size for one sector is 512 bytes. Teradata adds another 512 Bytes sector to the data block each time a sector is full until the maximum block size is reached.
Teradata will split the block into two blocks after a data block reaches the maximum block size.
What does this all mean for us? You can store the same amount of rows within a few big data blocks or many small ones. Smaller data blocks will result in more frequent block splits if rows are inserted.
Which data block size is the best depends on your workload requirements.
Most of the time, tables accessed by full table scans should probably have a big data block size. Many rows can be moved efficiently into the AMPs memory at once.
On the other hand, if you have a lot of unique index accesses on your table, which means only a few rows are accessed, it will be better to define small data blocks to avoid cluttering up the AMPs memory each time you retrieve the data.