Teradata Partitioning Strategies Anyone Would Be Proud Of
Teradata Partitioning and Performance
Depending on the application, a different Teradata Partitioning strategy may be used.
Considerations to be made before Partitioning a Table
Especially large tables are suitable for row-level partitioning. This is because selecting individual partitions on small tables is often more expensive than performing a full table scan.
However, partitioning a small table identically to a large table to which it is joined can make sense for performance reasons.
A column is all the more suitable as a partition column if WHERE conditions are frequently used which results in only a fraction of the table rows is selected.
The reverse is also true:
If there is no column after which a WHERE condition is often restricted or the WHERE conditions are not very selective, it makes no sense to use Teradata Partitioning.
If this does not result in other performance disadvantages, the columns should always be part of the primary index.
The advantage if the partition column is in the primary index: Each primary index value can only be present in exactly one partition, therefore the query is more performant.
Here is an example where it is a disadvantage to include the partition column in the primary index:
CREATE TABLE Sales
SalesId NOT NULL,
SalesDate DATE NOT NULL,
Price DECIMAL(18,2) NOT NULL
) PRIMARY INDEX (SalesId,SalesDate)
PARTITION BY RANGE_N ( SalesDate BETWEEN DATE ‘2019-01-01' AND ‘2019-12-31' EACH INTERVAL ‘1' DAY );
The following kind of tactical query is mainly executed:
SELECT * FROM Sales WHERE SalesId = <a particular SalesId>;
In this case, the table cannot be queried using the primary index. Therefore, SalesDate should not be in the primary index.
Yes, because the more partitions there are, the worse the performance will be if you join with an NPPI table or a PPI table that is partitioned differently.
For example: If you always query whole months, you should not create day partitions.
Although the optimizer gets better in it, simple partition expressions are preferable. The more complex a partition expression, the greater the risk that the optimizer overlooks partition elimination opportunities.
No, that depends on the application. See the next section for details.
Teradata Partitioning Strategies
- Each partition has about the same number of rows
- Some Partitions have a small number of rows, other partitions a large number of rows:
For example, it makes sense to partition the part of the data that is requested more often finer (more partitions with fewer rows).
- All partitions have the same number of different values but the number of rows per partition can be different.
I leave it up to you to think when we should use which Teradata Partitioning strategy. Leave your ideas in the comments, then other readers can benefit too!