This article explores the various methods for adding partitions to a Row Partitioned Table and determines the most efficient approach. Previously, we conducted a similar study to analyze the resource impact of modifying the structure of an unpartitioned table. During that study, we evaluated the effectiveness of INSERT…SELECT versus ALTER TABLE. For more information, please refer to our previous report.
Partitioning can be accomplished in two ways: by duplicating data into a new table or by utilizing an ALTER TABLE statement. Before proceeding, assessing the necessary resources for both methods is important. To do this, we will generate a test table with row partitions and populate it with 100,000 records.
As is customary, I will substantiate my assertions with tests. The test configuration employs a table comprising approximately 100,000 rows.
The date column in our test table spans from 1900-01-01 to 2022-12-31, resulting in an empty NO RANGE partition.
CREATE MULTISET TABLE PPI_Test
(
PK INTEGER NOT NULL,
COL_A DECIMAL(18,2),
TheDate DATE FORMAT 'YYYY-MM-DD'
) PRIMARY INDEX (PK) PARTITION BY (RANGE_N(TheDate BETWEEN DATE'1900-01-01' AND DATE'2022-12-31' EACH INTERVAL '1' DAY, NO RANGE));
We add daily partitions for the year 2023 using ALTER TABLE statement:
ALTER TABLE PPI_Test MODIFY PRIMARY INDEX ADD RANGE BETWEEN DATE '2023-01-01' AND DATE '2023-12-31' EACH INTERVAL '1' DAY;
Creating new partitions is a streamlined process as they do not require any row redistribution. The sole exception would be if there were an abundance of rows in the NO RANGE partition that would need to be redistributed to the newly created day partitions. However, this is not applicable in this scenario. As a result, the ALTER TABLE command incurs only 110 IOs and necessitates no spool space.
Now, let’s tackle the identical task with INSERT…SELECT statements:
CREATE MULTISET TABLE PPI_Test_Copy
(
PK INTEGER NOT NULL,
COL_A DECIMAL(18,2),
TheDate DATE FORMAT 'YYYY-MM-DD'
) PRIMARY INDEX (PK) PARTITION BY (RANGE_N(TheDate BETWEEN DATE'1900-01-01' AND DATE'2023-12-31' EACH INTERVAL '1' DAY, NO RANGE));
INSERT INTO PPI_Test_Copy
SELECT * FROM PPI_Test;
Upon revisiting the QueryLog, we examine the pivotal figures. The utilization of the INSERT…SELECT statement necessitates 228 IOs and compels the whole PPI_Test table to be deposited in the spool.
What occurs if rows need to be redistributed from the NO RANGE partition to new partitions? Prior to appending new partitions for 2023, we inserted numerous rows into our table with a date between January 1st and December 31st of 2023 for testing purposes. The same issue arises when a DROP RANGE is performed, necessitating the movement of rows to the NO RANGE partitions.
The logical input/output operations in our test configuration rose to 952, signifying an 865% increase compared to the test configuration that lacked modifications for the NO RANGE partition. Additionally, this procedure uses some spool space, although typically not as much as the INSERT INTO…SELECT method.
Here are the summarized results of our 3 test setups:
Method of adding partitions to PPI table | Logical IOs |
ALTER TABLE without NO RANGE rows | 110 |
ALTER TABLE with NO RANGE rows | 952 |
INSERT INTO … SELECT into table copy | 248 |
Summary
The ALTER TABLE method is generally more efficient. Still, it’s important to note that the number of IOs can significantly increase if there is a pre-populated NO RANGE partition and many rows need to be moved.
Efficiency is not the only factor to consider in a production environment. When using ALTER TABLE, an exclusive lock is placed on the table, which cannot be interrupted. Therefore, it is crucial to prevent significant data redistribution resulting from filled NO RANGE partitions, as it could potentially immobilize the entire Teradata system.
It is not recommended to rely solely on ALTER TABLE as the most efficient method without considering various factors affecting resource usage. Prior to repartitioning a significant table, it is advisable to conduct comprehensive testing.
Thanks Roland.
Is there anything like automatic partition roll out in Teradata/latest versions of Teradata?
I mean, the partitioning extends by itself for upcoming Years/months/dates etc?