ALTER Table versus INSERT INTO
Changing the structure of a Teradata table can require a lot of resources for large tables. In principle, two methods are available. We can modify the DDL using ALTER TABLE or create an empty table with the desired DDL statement and copy the data using the Teradata INSERT INTO statement. Both methods have their advantages and disadvantages.
To compare both methods, we created a test scenario. The test table had an average data block size of 127KB, and we filled it with very short rows to accommodate several hundred rows per data block.
Next, we executed both an ALTER TABLE statement and an INSERT INTO statement. It proved that the ALTER TABLE method is more efficient; it caused 133 I/Os, whereas the INSERT INTO strategy caused 322 I/Os. The ALTER TABLE statement required 59% less I/O than an INSERT INTO statement in Teradata. The required CPU seconds were about the same.
Then we changed the data demography of the table and filled it with very wide rows. This meant that there was only room for 1-2 rows per data block. Again we made an ALTER TABLE statement as well as an INSERT INTO statement.
Here a significant advantage of the ALTER TABLE method was shown. It needed 539 I/Os, while the INSERT INTO strategy needed 270853 I/Os. The ALTER TABLE method required an impressive 99.8% less I/Os!
We perform many further comparisons in which we varied either the data block size or the row size (or both simultaneously). The result was obvious. The fewer rows per data block, the greater the I/O savings for the ALTER TABLE method.
Below we can see a table showing the results (logical and phyiscal I/Os):
|Logical I/Os||Physical I/Os|
|INSERT BIG ROW||270 853||12 992|
|ALTER BIG ROW||539||15|
|INSERT SMALL ROW||322||30|
|ALTER SMALL ROW||113||14|
However, this does not mean that we should always use ALTER TABLE, as some disadvantages need to be weighed up carefully:
ALTER TABLE can’t be aborted. This may lead to unexpected situations, as the following example shows; if a workload management rule should kill all sessions consuming more than 100.000 CPU seconds and the ALTER TABLE reaches this limit, the session fails after the ALTER TABLE finished.
ALTER TABLE exclusively locks the table, while the Teradata INSERT INTO statement only needs a read lock. A long-running ALTER TABLE can delay other workloads for a long time. Finally, it might be considered risky to make changes to the original table. The INSERT INTO statement works on a copy of the table, and risk is therefore less.
ALTER TABLE adds new columns at the end of the table. If we need it in a specific position, we have to use the INSERT INTO statement,
Should we use Teradata INSERT INTO or ALTER TABLE?
ALTER TABLE is undoubtedly the better alternative from a performance tuning point of view. If we can live in a concrete scenario with the limitations we mentioned above, it is the first choice. Otherwise, it is better to think about optimized forms of the Teradata INSERT INTO statement.