Teradata ALTER Table versus INSERT INTO
Altering the structure of a substantial Teradata table can consume significant resources. Essentially, there are two approaches: altering the DDL through Teradata ALTER TABLE or generating an empty table with the desired DDL statement and transferring data via the Teradata INSERT INTO statement. Each method carries its own set of pros and cons.
We devised a test scenario to compare the methods using a table with an average data block size of 127KB. The table accommodated several hundred rows per data block, as we filled it with very short rows.
We performed an ALTER TABLE and an INSERT INTO query and found that the ALTER TABLE approach was more effective, resulting in 133 I/Os compared to the 322 I/Os of the INSERT INTO method. In Teradata, the ALTER TABLE query utilized 59% fewer I/Os than the INSERT INTO statement, with comparable CPU seconds.
We altered the data demography of the table and expanded the row width significantly, allowing for only 1-2 rows per data block. Subsequently, we executed an ALTER TABLE statement and an INSERT INTO statement.
The ALTER TABLE method demonstrated a notable advantage, as it only required 539 I/Os, in contrast to the INSERT INTO approach, which demanded 270853 I/Os. Thus, the ALTER TABLE method resulted in a remarkable 99.8% reduction of I/Os.
Through various comparisons with varied data blocks and row sizes, we have observed a clear trend. When there are fewer rows per data block, the ALTER TABLE method results in greater I/O savings.
Below we can see a table showing the results (logical and physical I/Os):
|Logical I/Os||Physical I/Os|
|INSERT BIG ROW||270 853||12 992|
|ALTER BIG ROW||539||15|
|INSERT SMALL ROW||322||30|
|ALTER SMALL ROW||113||14|
Nonetheless, utilizing ALTER TABLE should not be a default practice since its drawbacks require thorough consideration.
The execution of ALTER TABLE cannot be halted, which can result in unforeseen circumstances. For instance, if a workload management rule is set to terminate all sessions that consume over 100,000 CPU seconds and the ALTER TABLE surpasses this threshold, the session will fail only after the ALTER TABLE operation has concluded.
The Teradata INSERT INTO statement requires only a read lock, while ALTER TABLE results in an exclusive lock on the table. An extended ALTER TABLE process can significantly impede other workloads. Additionally, modifying the original table poses potential risks. On the other hand, the INSERT INTO statement operates on a duplicate table, mitigating the level of risk involved.
ALTER TABLE adds new columns at the end of the table. If we need it in a specific position, we have to use the INSERT INTO statement,
Should we use Teradata INSERT INTO or ALTER TABLE?
From a performance standpoint, ALTER TABLE is superior and should be prioritized when the aforementioned restrictions are acceptable. However, if these limitations are not feasible, exploring alternative, optimized forms of the Teradata INSERT INTO statement is advisable.