Teradata Partitioning and Performance
Depending on the application, a different Teradata Partitioning strategy may be used.
Teradata Partitioning has one goal – to optimize the performance of your queries. Therefore you should ask yourself the following questions before creating a PPI table:
Considerations to be made before Partitioning a Table
How big is the Table?
Especially large tables are suitable for row-level partitioning. This is because selecting individual partitions on small tables is often more expensive than performing a full table scan.
However, partitioning a small table identically to a large table to which it is joined can make sense for performance reasons.
How often is a Column used in a WHERE Condition?
A column is all the more suitable as a partition column if WHERE conditions are frequently used, which results in only a fraction of the table rows is selected.
The reverse is also true:
If there is no column after which a WHERE condition is often restricted, or the WHERE conditions are not very selective, it makes no sense to use Teradata Partitioning.
Should the Partition Column be part of the Primary Index?
If this does not result in other performance disadvantages, the columns should always be part of the primary index.
The advantage if the partition column is in the primary index: Each primary index value can only be present in exactly one partition. Therefore the query is more performant.
Here is an example where it is a disadvantage to include the partition column in the primary index:
CREATE TABLE Sales
SalesId NOT NULL,
SalesDate DATE NOT NULL,
Price DECIMAL(18,2) NOT NULL
) PRIMARY INDEX (SalesId,SalesDate)
PARTITION BY RANGE_N ( SalesDate BETWEEN DATE ‘2019-01-01’ AND ‘2019-12-31’ EACH INTERVAL ‘1’ DAY );
The following kind of tactical query is mainly executed:
SELECT * FROM Sales WHERE SalesId = <a particular SalesId>;
In this case, the table cannot be queried using the primary index. Therefore, SalesDate should not be in the primary index.
Does it make any difference how many Partitions I define?
Yes, because the more partitions there are, the worse the performance will be if you join with an NPPI table or a PPI table partitioned differently.
For example: If you always query whole months, you should not create day partitions.
Can Teradata handle all Types of Partitioning equally well, i.e. does the Optimizer recognize Partition Elimination Opportunities equally well?
Although the optimizer gets better in it, simple partition expressions are preferable—the more complex a partition expression, the greater the risk that the optimizer overlooks partition elimination opportunities.
Should all Teradata Partitions always contain approximately the same Number of Rows?
No, that depends on the application. See the next section for details.
Teradata Partitioning Strategies
- Each partition has about the same number of rows
- Some Partitions have a small number of rows, other partitions a large number of rows:
For example, it makes sense to partition the part of the data that is requested more often finer (more partitions with fewer rows).
- All partitions have the same number of different values, but the number of rows per partition can differ.
I leave it up to you to think when we should use which Teradata Partitioning strategy. Leave your ideas in the comments; then, other readers can benefit too!