Teradata Big Data Blocks – Introduction
Starting with Teradata 13.10, the size of file system components (cylinders and data blocks) increased. In Teradata 13.10, the cylinder size was increased to 11.3 Megabytes (from 1.9 MB), and in Teradata 14.10, the limit of data block size was increased to 1 Megabyte (from 127.5KB).
While the table row size still is limited to 64 Kilobytes, starting with Teradata 14.10, the temporary rows being assembled in spool space can be up to 1 MB in size. Yet, the row size limit for the last spool (the result set returned to the client) stayed at 64KB, but this limit will probably be removed in Teradata’s future release.
All of these measures together can help to achieve better system performance but may decrease performance in certain situations. We will discuss the advantages and disadvantages of these changes in detail.
The increased data block size allows storing more rows per data block. As a result, with each I/O operation, many rows can be written and read. More massive rows lead to performance advantages:
- Firstly, the performance of queries which use large spool tables can be improved. Typically, these are queries doing full table scans, such as queries related to strategic workload.
- Full table scan performance improvement is not restricted to read operations but includes updates and insert…select statements.
- The performance of TPT jobs with an update, insert, and delete steps can be improved.
- Depending on the data’s nature, big data blocks can improve the compression factor of block-level compression (BLC). With each data block holding a larger number of rows, the algorithms used for block-level compression may achieve better results when scanning the rows for common patterns being compressed.
- The performance of tables containing wide rows will be improved. For example, when using 127.5 KB-sized data blocks, only one 64KB row fits each data block. The system needs one I/O to read each row. By increasing the data block size, I/O operations will become more efficient.
- The performance of sort operations can be faster than with smaller data blocks as each sorted buffer can hold and work more rows simultaneously.
Above mentioned performance improvements will increase the available memory per AMP.
- As hinted in the last statement above, big data blocks will need more memory for steps such as reading, writing, sorting of data, and joins of partitioned tables. Insufficient AMP memory can be the result.
- The performance of the tactical workload can decrease. As tactical workload is characterized by index access (such as the primary index), typically, only one or a few rows are located. As Teradata has to move 1MB instead of 127.5 KB from the disk to the FSG cache – to reach one row – the transfer costs are higher, and of course, more memory is consumed.
- Similarly, for the same reason, TPT Stream and TPump performance can decrease.
- The performance of row partitioned tables can reduce. There are two grounds for this: First of all, the sliding window merge join is mostly based on available memory. If one data block from each non-eliminated partition fits into memory, the NPPI table only has to be once. Unfortunately, the bigger data blocks consume more memory, probably leading to a situation that the NPPI table has to be read several times. Reading a table more than once will decrease join performance. As you can imagine, the risk will be much higher for tables with many partitions and just a few rows per partition. Secondly, using bigger data blocks will increase the chance that a data block holds rows from different partitions and decreases row partitioning advantage.
Although big data blocks can help improve performance, they are not suitable for each situation. Luckily, even after upgrading to 1MB data blocks, it can be decided on a per-table basis if smaller blocks should be used.
While strategic workload will benefit from big data blocks, you should probably switch back to small blocks for tactical workload queries and row partitioned tables, having many partitions.