Teradata Big Data Blocks – Introduction
Starting with Teradata 13.10, the size of file system components (cylinders and data blocks) increased. In Teradata 13.10, the cylinder size was increased to 11.3 Megabytes (from 1.9 MB), and in Teradata 14.10, the limit of data block size was increased to 1 Megabyte (from 127.5KB).
While the table row size is still limited to 64 Kilobytes, starting with Teradata 14.10, the temporary rows assembled in spool space can be up to 1 MB in size. Yet, the row size limit for the last spool (the result set returned to the client) stayed at 64KB, but Teradata will probably remove this limit in Teradata’s future release.
These measures can help achieve better system performance but may decrease performance in certain situations. We will discuss the advantages and disadvantages of these changes in detail.
The increased data block size allows storing more rows per data block. As a result, with each I/O operation, many rows can be written and read. More massive rows lead to performance advantages:
- Firstly, the performance of queries that use large spool tables can be improved. Typically, these are queries doing full table scans, such as queries related to strategic workload.
- Full table scan performance improvement is not restricted to read operations but includes updates and insert…select statements.
- The performance of TPT jobs with an update, insert, and delete steps can be improved.
- Depending on the data’s nature, big data blocks can improve the compression factor of block-level compression (BLC). With each data block holding a more significant number of rows, the algorithms used for block-level compression may achieve better results when scanning the rows for common compressed patterns.
- This approach will improve the performance of tables containing wide rows. For example, when using 127.5 KB-sized data blocks, only one 64KB row fits each data block. The system needs one I/O to read each row. By increasing the data block size, I/O operations will become more efficient.
- The performance of sort operations can be faster than with smaller data blocks, as each sorted buffer can simultaneously hold and work more rows.
Above mentioned performance improvements will increase the available memory per AMP.
- As hinted in the last statement above, big data blocks will need more memory for reading, writing, sorting data, and joining partitioned tables. Insufficient AMP memory can be the result.
- The performance of the tactical workload can decrease. As tactical workload is characterized by index access (such as the primary index), typically, only one or a few rows are located. Teradata has to move 1MB instead of 127.5 KB from the disk to the FSG cache – to reach one row – the transfer costs are higher, and more memory is consumed.
- Similarly, TPT Stream and TPump performance can decrease for the same reason.
- The performance of row partitioned tables can reduce. There are two grounds for this: First, the sliding window merge join is mainly based on available memory. If one data block from each non-eliminated partition fits into memory, the NPPI table must only be once. Unfortunately, the bigger data blocks consume more memory, probably leading to a situation where Teradata must read the NPPI table several times. Reading a table more than once will decrease join performance. As you can imagine, the risk will be much higher for tables with many partitions and just a few rows per partition. Secondly, bigger data blocks will increase the chance that a data block holds rows from different partitions and decreases the row partitioning advantage.
Although big data blocks can help improve performance, they are unsuitable for each situation. Luckily, even after upgrading to 1MB data blocks, it can be decided on a per-table basis if we should use smaller blocks.
While strategic workload will benefit from big data blocks, you should probably switch back to small blocks for tactical workload queries and row partitioned tables, having many partitions.