Introduction to the Teradata PPI
Even though solid-state disks (SSDs) are increasingly replacing hard disks, a database system’s access to mass storage will always be the slowest process. Therefore, all database providers search for methods to reduce the number of I/Os.
As of Release 14.10, Teradata provides functions for reducing I/Os, such as column storage, block-level compression, or Teradata Intelligent Memory.
These new features will improve performance, but we would like to point you to a mature method that meets your improvement needs in equal measure:
The partitioned primary index (PPI)
What is a Partitioned Primary Index?
Commonly queried rows are stored together in partitions so that you can restrict data retrieval to those partitions that contain the rows required by your SQL statement. Other partitions that do not have requested rows are not read.
Other database systems, such as Netezza, go the other way by storing the information about the data blocks where the requested rows are not stored. Netezza called this method “Data Skipping”.
Partitioning will not help avoid table scans, but it will reduce the number of data blocks each AMP must move to its local memory (FSG cache) because only relevant partitions need to be accessed.
How is Teradata Partitioning implemented?
The Teradata Partitioned Primary Index aims to keep the records of a table packed together in clearly delineated containers. There are not too many in total about the table’s size. In that sense, partitioning is a different way of structuring the data records on the disks.
In partitioned tables (PPI Tables), the primary index determines each row’s responsible AMP. Only the way the rows are stored in the data blocks is different: While not partitioned tables keep their rows sorted by rowid only, partitioned tables (PPI tables) store rows first inside the corresponding partitions and only afterward sorted by the rowid.
Whenever a data record is accessed, the AMP’s cylinder index is queried to determine which cylinder the first data block of the accessed partition is located. After positioning at the first data block, Teradata can fetch all remaining partition data blocks (or subpartitions in multi-level partitioning) into the FSG cache by scanning in a sequence with a minimum number of disk IOs.
It is a remarkable technical detail that Teradata internally considers all tables partitioned. NPPI tables are nothing other than PPI tables with precisely one partition, partition number zero, containing all table rows. Collecting statistics on the dummy column PARTITION would be best, even for not partitioned tables. The optimizer uses this to estimate the table cardinality of NPPI tables.
Accessing chunks of data along the partitioning attributes is often called partition elimination. This convention might lead you to believe that an aforementioned excluding approach is in place when, in fact, it is more of a “partition picking” that takes place.
You may wonder in which way partitioning is different from indexing techniques on Teradata.
The most notable difference is that all Teradata index types consume permanent space. Any Teradata index is a sub-table that redundantly stores a subset of its related base table columns.
While we still experience an impact on the occupied permanent disk space, this is not related to the redundant storage of columns but is caused by the overhead of storing the records in a partitioned way: PPI table rows are 2 to 8 bytes wider than the equivalent not partitioned table row. The extra bytes store the internal partition number of each row.
Traditionally, partitioning by date is one of the primary applications for a PPI. Most reports have a date dimension as part of their reporting filters. Consider a call detail table in telecommunication companies. These are huge tables holding detailed call information about every call for a particular time frame. Analytics takes place directly on this one big table, and often, along with joins to other tables, it is not mandatory to provide the query result.
Imagine a daily call detail table containing a full year of data, but you only need one day in your report result dataset. If you partition this table by date, each AMP can restrict access to this date, i.e., Teradata must move only 1/365 of the rows from the disks into the Amp’s memory (FSG cache).
We should always consider partitioning over other index types if ranges of data records have to be accessed together. Typical workloads are aggregations (“sum up all account balances for my business segment,” etc.), which you often find in reporting requirements.
On the other hand, specific indexing techniques that allow direct access to the data blocks (primary index, unique secondary index, join index with matching primary index definition, etc.) are usually suitable for OLTP applications (“Give me the customer name for the client where customer id = 123”).
Combining partitioned tables with other index types allows you to improve usability even further.
Several restrictions existed until Teradata Release 12 if you wanted to partition a table, especially regarding data types. There was no possibility of partitioning character columns and timestamps. At least there was a workaround for character columns:
PARTITION BY RANGE_N(
(HASHBUCKET(HASHROW( ))) MOD 5 BETWEEN 0 AND 4 EACH 1)
With Teradata Release 13.10, character and timestamp column partitioning is possible directly.
Even the system variable CURRENT_DATE can be used in the partitioning statement for timestamps:
PARTITION BY RANGE_N(CALENDAR_DT BETWEEN CURRENT_DATE-365 AND CURRENT_DATE EACH INTERVAL ‘1’ DAY );
Using CURRENT_DATE in your partition statement makes it easy to increase your partitions up to the most current date:
— Delete the rows not matching any partition
ALTER TABLE <TABLE> TO CURRENT WITH DELETE;
— Insert the rows not matching into a backup table
ALTER TABLE <TABLE> TO CURRENT WITH INSERT INTO <BACKUP_TABLE>;
Three partitionings are supported: A simple partitioning by column, the CASE_N syntax, and the RANGE_N syntax.
As the names already indicate, RANGE_N allows you to create ranges of rows that will end up in the same partition, and CASE_N enables you to use the simple CASE … WHEN statement.
Here is one example of an often-used RANGE_N partitioning by date:
CREATE TABLE <TABLE>
PK INTEGER, MyDate DATE)
PRIMARY INDEX (MyDate)
PARTITION BY RANGE_N (MyDate BETWEEN DATE ‘2014-01-01’ AND DATE ‘2020-12-31’ EACH INTERVAL ‘1’ DAY, NO RANGE, UNKNOWN);
As you can see from the above example, two distinct partitions exist that you can add to each table:
All rows not matching defined partitions will end up in the NO RANGE bucket. If you do not specify an UNKNOWN partition and try to insert records not matching any specified partition, you will receive an error message, and the insert will fail.
The UNKNOWN partition holds rows with NULL values in the partition expression.
Here is one example of a CASE_N partitioning:
CREATE TABLE <TABLE>
PK INTEGER, MyValue INTEGER)
PRIMARY INDEX (MyDate)
PARTITION BY CASE_N (MyValue < 1000 MyValue < 2000, MyValue < 3000, NO RANGE, UNKNOWN);
Again, we can add the two particular partitions as required.
CASE_N partitioning should be utilized if you want to group values into partitions.
Partitioning is not restricted to one level. Multi-level partitioning allows the partitioning of tables on more than one level. Here is one example:
CREATE TABLE <TABLE>
PK INTEGER, MyValue INTEGER)
PRIMARY INDEX (MyDate)
PARTITION BY (
CASE_N (MyValue < 1000 MyValue < 2000, MyValue < 3000, NO RANGE, UNKNOWN),
RANGE_N (MyDate BETWEEN DATE ‘2014-01-01’ AND DATE ‘2020-12-31’ EACH INTERVAL ‘1’ DAY, NO RANGE, UNKNOWN));
Just keep in mind that some partitions are limited. Up to release 13.10 of Teradata, the maximum number of partitions was 65535. Pay attention to the fact that this is the number of combined partitions in multi-level partitioning, and don’t forget to count the NO RANGE and UNKNOWN partitions. You have to multiply the number of partitions on each level to get the overall number of partitions in use. Starting with Teradata 14.10, it is possible to have more than 9.000 Quintillion partitions.
When does Teradata partition elimination take place?
Many misconceptions exist about when partition elimination takes place.
First, you have to distinguish between partition elimination and primary index access. Partition elimination occurs independently of any involvement of the Primary Index in a query.
Although the primary index choice plays a significant role in partition decisions, the elimination of partitions accounts for the vast leap in IO reduction.
These are the possible performance patterns for partitioned primary index access. They are dependent on how PI and Partitions are referenced:
Furthermore, you do not have to include all partitions in the WHERE condition for multi-level partitioning to cut partitions. Each partition can be addressed independently, an attractive feature in Teradata.
Now for the limitations of the primary index choice in partitioning.
Many of us have heard about a performance difference between tables, including all partition columns in the primary index, and tables that do not include the partition columns in the primary index. Also, the primary index cannot be unique if tables do not have the partitioning columns. What’s behind all this?
To understand the relation between primary index uniqueness and partitioning, we have to recall how rows are stored on the disks:
In the first step, the rows are hashed to the responsible AMPs. They are inserted into the proper partitions in the second phase and sorted by the ROWID.
Imagine a table with a non-unique primary index: Many rows can have the same primary index and hash to the same AMP. However, each of these rows could belong to a different partition.
In the case of a not partitioned table, all rows with the same hash are stored together. In the event of a partitioned table, the rows are scattered across different partitions.
Imagine it was allowed to have a partitioned table with a unique primary index without the partition column being part of the primary index. Any update or insert statement would need Teradata to check each partition to avoid the creation of duplicates. Having to check each partition is inefficient from a performance point of view.
Also, keep in mind that if the primary index is not including the partitioning columns, each time primary index access is required, the responsible AMP has to scan all its partitions for this particular primary index. We can avoid the scan of all partitions if you include the partition columns in the primary index. I consider this a minor problem, more related to OLTP.
Table partitioning has other impacts on performance as well. Especially when joining a non-partitioned and a partitioned table together, Teradata has to apply different join techniques for non-partitioned tables. A Sliding Window merge join is one of the join techniques related to partitioning. Another option is to “de-partition” the partitioned table and does a regular merge join, or we can partition the table which is not partitioned.
Similar to joining two non-partitioned tables, it is best to have the same primary index on both tables from a performance point of view. A join takes place on all primary index columns and the partition columns. Anything else requires less efficient join techniques.
Hence, while eliminating partitions may significantly reduce disk IOs, you must watch its join performance impact.
Some more benefits arise when using partitioned tables: The records of complete partitions can be removed from the table with a simple statement and, more importantly, without involving the transaction log:
MODIFY DROP RANGE BETWEEN DATE ‘2014-01-01’ AND DATE ‘2013-06-30’ EACH INTERVAL ‘1’ DAY;
Finally, here are the “best practice” statistic recommendations for partitioned tables:
- Table PI
- Partitioning column(s)
- Columns in any not partitioned table that are equated to the partitioning column of the partitioned table
- The system-derived PARTITION dummy column of all partitioned tables
I could convince you that I consider partition elimination an essential tool in the toolkit for performance optimization.
More on indexing & partitioning here:
The Secondary Index in Teradata – The NUSI compendium