Teradata Partitioning – Row vs. Column
Partitioning tables using column or row partitioning can reduce resource usage when accessing data.
Row partitioning horizontally partitions tables, allowing the optimizer to limit data access to a cluster of rows. On the other hand, column partitioning vertically partitions tables, enabling the optimizer to restrict data access to a subset of columns.
The rows of partitioned tables are stored in physical clusters, which are groups of rows. For instance, a cluster may contain all the rows from a specific date.
If a cluster is utilized in a query condition, the Teradata optimizer will generate an execution plan that restricts access solely to the partitions with the relevant date. This eliminates the need for a full table scan (FTS).
Partitioned tables and non-partitioned tables are stored similarly, with rows distributed across all AMPs and organized by partition number, row hash value, and uniqueness value. In fact, non-partitioned tables are stored internally like partitioned tables, with all rows belonging to the dummy partition 0.
Row-partitioned tables store each row in a single physical row on a disk.
Column partitioning clusters one or more table columns to create partitions. Each partition represents a cluster. Physical rows within column-partitioned tables contain a subset of columns and do not represent complete logical rows.
Row partition clustering allows for complex expressions to be utilized, whereas column partitioning is restricted to the columns’ definition and creates a single cluster. Therefore, column partitioning is more straightforward to implement.
Row-partitioned tables offer the advantage of partition elimination, which allows the optimizer to limit search activity to the accessed partition when a query selects only one partition. This eliminates the need for a full table scan.
Column partitioned tables offer the advantage of selecting a subset of columns, thereby decreasing disk IO.
Combining column and row partitioning in a single table can enhance resource utilization.
Allow me to provide an example for analyzing four potential table configurations: without partitioning, with row partitioning, with column partitioning, and with both row and column partitioning.
CREATE SET TABLE THE_TABLE (COL_A INTEGER, COL_B INTEGER, COL_C INTEGER, COL_D INTEGER, COL_E INTEGER, COL_F INTEGER);
We are executing this query:
SELECT SUM(COL_A) FROM mytable WHERE COL_C IN (2,3,4);
Row partitioning will be implemented on the column COL_C, assuming that column containers are available for COL_A and COL_C.
- In the case of a column-partitioned table, all five rows but only two columns have to be accessed.
- All five rows and columns must be accessed in the not-partitioned table.
- All columns, but only three rows, must be accessed in a row-partitioned table.
- Only three rows and two columns must be accessed in a row and column-partitioned table.
Choosing the right partitioning can optimize resource utilization. However, it is crucial to understand your workload, as each partitioning approach has distinct pros and cons based on the type of workload. To learn more, please refer to our article on the Teradata columnar feature.