This article will highlight the contrasting architectural features of Teradata and SQL Server, along with their shared aspects and performance impacts. The piece titled “Teradata versus SQL Server” will be periodically updated.
What should be considered for the performance of COUNT(*)?
Teradata usually has a clear performance advantage over SQL Server. The number of rows is counted either on the base table’s cylinder index (typically cached) or an index.
The SQL Server can also use an index (thereby reducing the number of I/Os required) but must otherwise make a full table scan. The performance difference here can be enormous. Simultaneously, SQL Server may need to optimize a COUNT(*) query. Teradata does not need to do this.
A full table scan on SQL Server harms performance when only a few rows fit into a page. The fewer rows per page, the more I/Os are needed for the full table scan and the longer the query takes. We also know this problem from Teradata, where we can optimize the data block size to reduce the performance impact of large rows.
How do large objects affect performance?
Teradata stores large objects in a separate sub-table. Therefore, these columns should only be queried when needed to keep the I/Os number to a minimum.
The principle is similar to SQL Server. Large objects are stored in separate LOB data pages. Depending on the LOB’s number of bytes, a B-tree is needed to manage the chunks. The structure is much more complex. It is even more critical than with Teradata to avoid SELECT * queries to avoid reading LOB data pages and thus minimize the I/Os.
How are data blocks (Teradata) or pages (SQL Server) managed?
In Teradata, data blocks grow dynamically up to a maximum size. A block split occurs, and half of the rows are placed in each data block. Conversely, two data blocks with few rows are merged if they contain only a few rows due to a DML operation. This happens automatically in the background. If a row in a data block is updated and does not fit into the same data block anymore, it is written into another (new) data block. This applies to PI as well as NOPI tables.
SQL Server solves the problem described above differently. For (unsorted) heap tables (typically used for staging like Teradata NOPI tables), the rows are stored in pages with static size. These pages are filled up with the inserted rows. If an update occurs in which a row is enlarged and does not fit in the original position, it is moved to another page. Only a forwarding pointer to the new row remains on the original page. This approach is used to keep the number of I/Os low; on the one hand, non-clustered indexes do not need to be updated with this approach; on the other hand, it prevents a row from being read multiple times. Let us assume that a row on page 1 is changed, and the updated row is stored on page 3. If page 2 were read at the same time (pages are always scanned sequentially), the same (but changed) row would be reread from page 3 (which was already read from page 1). Without Forwarding Pointer, rows would be read multiple times.
Forwarding pointers reduce the number of I/Os, as mentioned above, but have a significant disadvantage: the number of forwarding pointers increases over time which then causes additional I/Os. There is no automatic restructuring for heap tables.
An ALTER TABLE REBUILD statement must be executed.
These disadvantages of SQL Server can be removed by adding a Clustered Index (one per table that sorts the data), but this is costly and is expensive as it consumes additional storage space.
Teradata Ordered Index versus SQL Server Clustered Index – what are the differences?
An ordered index is used to efficiently query ranges of values.
Teradata allows NUSI and Join Index to be created as ordered index. The Index is sorted by a 4-byte integer value (including the datatype date). These indexes must only contain the required columns and thus be optimized for space consumption. Teradata elegantly uses Master and Cylinder Indexes for a value-ordered index. Instead of the sorted ROWHASH values, the sorted column values are stored:
CREATE INDEX (CustomerId, LastName) ORDER BY VALUES (OpenDate) ON Customer;
SQL Server achieves value ordering with the Clustered Index. For this purpose, an ordered index is created on a heap table (which is a full-column copy of the heap table but sorted differently using a double-chained list):
CREATE CLUSTERED INDEX IDX_Customer ON dbo.Customer(OpenDate);
The data itself is unsorted within the pages. The so-called slot array (equivalent to the array of pointers in each Teradata data block) sorts them.
The double-chained list is also called leaf level. If it consists of multiple pages, intermediate levels are created. At the top of the list, there is a single root level. Data can be scanned in the defined order starting from the root level.