In this article, we will look in detail at what options there are for adding new partitions to a Row Partitioned Table and which is more efficient. We have already done a similar experiment where the goal was to measure resource impact when changing the structure of a non-partitioned table. At that time, we compared INSERT…SELECT with ALTER TABLE. You can read the details here:
Adding partitions can be done in two different ways. We can again copy the data into a new table or use an ALTER TABLE statement. To calculate the required resources for both variants, we first create a row partitioned test table and load it with 100,000 rows.
As usual, I will underline my claims with some tests. The test setup uses a table containing about 100 000 rows.
Our test table contains dates between 1900-01-01 and 2022-12-31 in the TheDate column, so there are no rows in the NO RANGE partition:
CREATE MULTISET TABLE PPI_Test
(
PK INTEGER NOT NULL,
COL_A DECIMAL(18,2),
TheDate DATE FORMAT 'YYYY-MM-DD'
) PRIMARY INDEX (PK) PARTITION BY (RANGE_N(TheDate BETWEEN DATE'1900-01-01' AND DATE'2022-12-31' EACH INTERVAL '1' DAY, NO RANGE));
We add daily partitions for the year 2023 using ALTER TABLE statement:
ALTER TABLE PPI_Test MODIFY PRIMARY INDEX ADD RANGE BETWEEN DATE '2023-01-01' AND DATE '2023-12-31' EACH INTERVAL '1' DAY;
Adding new partitions is efficient since they only need to be created, but no redistribution of rows needs to be done. The only exception would be if there were many rows in the NO RANGE partition, which would have to be redistributed to the new day partitions. This is not the case here. Therefore the ALTER TABLE causes only 110 IOs and needs no spool space.
Next, let’s solve the same task using INSERT…SELECT statements:
CREATE MULTISET TABLE PPI_Test_Copy
(
PK INTEGER NOT NULL,
COL_A DECIMAL(18,2),
TheDate DATE FORMAT 'YYYY-MM-DD'
) PRIMARY INDEX (PK) PARTITION BY (RANGE_N(TheDate BETWEEN DATE'1900-01-01' AND DATE'2023-12-31' EACH INTERVAL '1' DAY, NO RANGE));
INSERT INTO PPI_Test_Copy
SELECT * FROM PPI_Test;
We look again at the key figures in the QueryLog. The INSERT…SELECT statement requires 228 IOs and must place the entire PPI_Test table in the spool.
But what happens if Rows have to be redistributed from the NO RANGE partition to the new partitions? We have test inserted many rows with a date between 2023-01-01 and 2023-12-31 into our table before we appended new partitions for 2023. Of course, we have the same problem when we do a DROP RANGE, and Rows have to be moved to the NO RANGE partitions.
In our test setup, logical IOs increased to 952, which is 865% more IOs than the test setup without changes for the NO RANGE partition! Furthermore, this process consumes some spool space (but usually not as much as the INSERT INTO…SELECT approach).
Here are the summarized results of our 3 test setups:
Method of adding partitions to PPI table | Logical IOs |
ALTER TABLE without NO RANGE rows | 110 |
ALTER TABLE with NO RANGE rows | 952 |
INSERT INTO … SELECT into table copy | 248 |
Summary
As we have seen, ALTER TABLE is typically the more efficient method. However, we must consider that the number of IOs can increase drastically if there is a ready-filled NO RANGE partition and masses of rows are moved.
In a production environment, however, we must consider other factors besides efficiency. ALTER TABLE requires an exclusive lock on the table and cannot be stopped. So it is essential to ensure that there is not a massive redistribution of data caused by filled NO RANGE partitions because then the whole Teradata system could be paralyzed.
Please do not take this test as proof that ALTER TABLE is always more efficient. Many other factors influence resource consumption. Therefore I recommend you always to test before you repartition a large table.
Thanks Roland.
Is there anything like automatic partition roll out in Teradata/latest versions of Teradata?
I mean, the partitioning extends by itself for upcoming Years/months/dates etc?