Teradata Big Data Blocks – Introduction
Starting with Teradata 13.10, the size of file system components (cylinders and data blocks) has been increased. In Teradata 13.10 the cylinder size was increased to 11.3 Megabytes (from 1.9 MB), and in Teradata 14.10 the limit of data block size was increased to 1 Megabyte (from 127.5KB).
While the table row size still is limited to 64 Kilobytes, starting with Teradata 14.10, the temporary rows being assembled in spool space can be up to 1 MB in size. Still, the row size limit for the last spool (the result set returned to the client) stayed at 64KB, but this limit probably will be removed in a future release of Teradata.
All of these measures together can help to achieve better system performance but may decrease performance in certain situations. We will discuss advantages and disadvantages of these changes in detail.
The increased data block size allows storing more rows per data block. As a result, with each I/O operation, a larger amount of rows can be written and read. Larger rows lead to performance advantages:
- Firstly, the performance of queries which use large spool tables can be improved. Typically, these are queries doing full table scans, such as queries related to strategic workload.
- Full table scan performance improvement is not restricted to read operations but includes updates and insert…select statements.
- The performance of TPT jobs with an update, insert and delete steps can be improved.
- Depending on the nature of the data, big data blocks can improve the compression factor of block level compression (BLC). With each data block holding a larger amount of rows, the algorithms used for block-level compression may be able to achieve better results when scanning the rows for common patterns being compressed.
- The performance of tables containing wide rows will be improved. As an example: When using 127.5 KB sized data blocks, only one 64KB row fits into each data block. The system needs one I/O to read each row. By increasing the data block size, I/O operations will become more efficient.
- The performance of sort operations can be faster than with smaller data blocks as each sorted buffer can hold and work more rows at the same time.
Above mentioned performance improvements will increase the available memory per AMP.
- Like hinted in the last statement above, big data blocks will need more memory for steps such as reading, writing, sorting of data and joins of partitioned tables. Insufficient AMP memory can be the result.
- The performance of tactical workload can decrease. As tactical workload is characterized by index access (such as the primary index), typically only one or a few rows are located. As Teradata has to move 1MB instead of 127.5 KB from the disk to the FSG cache – just to reach one row – the transfer costs are higher, and of course more memory is consumed.
- Similarly, for the same reason, TPT Stream and TPump performance can decrease.
- The performance of row partitioned tables can reduce. There are two grounds for this: First of all, the sliding window merge join is mostly based on available memory.If one data blocks from each non-eliminated partition fit into memory, the NPPI table only has to be once.Unfortunately, the bigger data blocks consume more memory, probably leading to a situation that the NPPI table has to be read several times. Reading a table more than once will decrease join performance. As you can image, the risk will be much higher for tables with many partitions and just a few rows per partition.Secondly, using bigger data blocks will increase the chance that a data block holds rows from different partitions, and at the same time decreasing the advantage of row partitioning.
Although big data blocks can be helpful in improving performance, they are not suitable for each situation. Luckily, even after upgrading to 1MB data blocks, it can be decided on a per table basis if smaller blocks should be used.
While strategic workload will benefit from big data blocks, you should probably switch back to small blocks for tactical workload queries and row partitioned tables having a lot of partitions.