Teradata Partitioning and Performance
We can use Teradata Partitioning in several ways. In this article, we will discuss the possible strategies. The goal of all strategies is to reduce the resource consumption of our queries. The questions below will help you decide how best to use row-level partitioning.
Considerations to be made before Partitioning a Table
How big is the Table?
Huge tables are suitable for row-level partitioning. For the Optimizer, selecting individual partitions on small tables is not worthwhile, and a full table scan is often just as efficient.
Nevertheless, partitioning a small table identically to a large table can improve performance for joins.
How often is a Column used in WHERE Conditions?
A column is more suitable as a partition column if WHERE conditions are frequently applied to this column, resulting in only a fraction of the table rows being read.
The opposite statement is also true:
If there is no column after which a WHERE condition is frequently restricted, or the selectivity of the WHERE conditions is low, it makes no sense to use Teradata Partitioning.
Should the Partition Column be part of the Primary Index?
If this does not result in other performance disadvantages, the columns should always be part of the primary index.
The advantage when partition columns are in the primary index: Each primary index value can only be present in exactly one partition. Therefore the query is more performant as no partition probing is needed to locate the rows.
Here is an example where it is a disadvantage to include the partition column in the primary index:
CREATE TABLE Sales
SalesId NOT NULL,
SalesDate DATE NOT NULL,
Price DECIMAL(18,2) NOT NULL
) PRIMARY INDEX (SalesId,SalesDate)
PARTITION BY RANGE_N ( SalesDate BETWEEN DATE ‘2019-01-01’ AND ‘2019-12-31’ EACH INTERVAL ‘1’ DAY );
The following kind of tactical query is mainly executed:
SELECT * FROM Sales WHERE SalesId = <a particular SalesId>;
In this case, the Table cannot be queried using the primary index. Therefore, SalesDate should not be in the primary index.
Does it make any difference how many Partitions we define?
Yes, because the more partitions there are, the worse the performance will be if we join with an NPPI table or a PPI table partitioned differently. Partition probing can also become an issue if the partition columns are not part of the Primary Index.
For example: If we always query whole months, we should not create daily partitions.
Can Teradata handle all Types of Partitioning equally well, i.e., does the Optimizer recognize Partition Elimination Opportunities?
Although the Optimizer gets better at it, simple partition expressions are preferable—the more complex a partition expression, the greater the risk that the Optimizer overlooks partition elimination opportunities.
Should all Teradata Partitions always contain approximately the same number of Rows?
No, that depends on the application. See the next section for details.
Teradata Partitioning Strategies
- Each partition has about the same number of rows
- Some Partitions have a small number of rows, while other partitions have a large number of rows:
For example, it makes sense to partition the part of the requested data more often (more partitions with fewer rows).
- All partitions have the same number of different values, but the number of rows per partition can differ.
I leave it up to you to judge when we should use which Teradata Partitioning strategy. Leave your ideas in the comments; other readers can benefit too!
[…] Row Partitioning is by definition not a tool to design a PDM optimized for the tactical workload. However, it does support queries by replacing full table scans with more efficient partition scans. […]