Teradata Big Data Blocks – Introduction
Teradata 13.10 introduced larger file systems components, such as cylinders and data blocks. The cylinder size was increased from 1.9 MB to 11.3 MB, while the data block size limit was raised from 127.5 KB to 1 MB in Teradata 14.10.
Teradata 14.10 allows temporary rows in spool space to reach a maximum size of 1 MB, despite the continued limitation of table row size to 64 Kilobytes. However, the last spool – the result set given to the client – still has a row size limit of 64KB, though Teradata plans to remove this restriction in future releases.
Implementing these measures can improve system performance. However, there may be certain situations that could also lead to a decrease in performance. We will provide a comprehensive analysis of the benefits and drawbacks of these modifications.
Larger data block sizes enable the storage of additional rows per block, facilitating the reading and writing of numerous rows per I/O operation. This advantage is due to the performance benefits associated with larger rows.
- Firstly, the performance of queries that use large spool tables can be improved. Typically, these are queries doing full table scans, such as queries related to strategic workload.
- Full table scan performance improvement is not restricted to read operations but includes updates and insert…select statements.
- The performance of TPT jobs with an update, insert, and delete steps can be improved.
- Depending on the data’s nature, big data blocks can improve the compression factor of block-level compression (BLC). With each data block holding a more significant number of rows, the algorithms used for block-level compression may achieve better results when scanning the rows for common compressed patterns.
- This approach will improve the performance of tables containing wide rows. For example, when using 127.5 KB-sized data blocks, only one 64KB row fits each data block. The system needs one I/O to read each row. By increasing the data block size, I/O operations will become more efficient.
- The performance of sort operations can be faster than with smaller data blocks, as each sorted buffer can simultaneously hold and work more rows.
Above mentioned performance improvements will increase the available memory per AMP.
- As the last statement above hints, big data blocks will need more memory for reading, writing, sorting data, and joining partitioned tables. Insufficient AMP memory can be the result.
- The performance of the tactical workload can decrease. As tactical workload is characterized by index access (such as the primary index), typically, only one or a few rows are located. Teradata has to move 1MB instead of 127.5 KB from the disk to the FSG cache – to reach one row – the transfer costs are higher, and more memory is consumed.
- Similarly, TPT Stream and TPump performance can decrease for the same reason.
- The performance of row-partitioned tables can reduce. There are two grounds for this: First, the sliding window merge join is mainly based on available memory. If one data block from each non-eliminated partition fits into memory, the NPPI table must only be once. Unfortunately, the bigger data blocks consume more memory, probably leading to a situation where Teradata must read the NPPI table several times. Reading a table more than once will decrease join performance. As you can imagine, the risk will be much higher for tables with many partitions and just a few rows per partition. Secondly, bigger data blocks will increase the chance that a data block holds rows from different partitions, decreasing the row partitioning advantage.
Big data blocks can enhance performance but may not suit every situation. Fortunately, it is possible to decide on a per-table basis whether to use smaller blocks, even after upgrading to 1MB data blocks.
Big data blocks are ideal for strategic workloads, but it is advisable to revert to small blocks for tactical workload queries and row-partitioned tables with numerous partitions.