The Teradata Partitioned Primary Index (PPI) Guide
Even though solid state disks (SSDs) are increasingly replacing hard disks, access to mass storage will always be the slowest process in a database system. Therefore, all database providers search for methods to reduce the number of I/Os.
As of Release 14.10, Teradata provides functions for reducing I/Os such as column storage, block-level compression or Teradata Intelligent Memory.
These new features will definitely help to improve performance, but we would like to point you to a mature method that meets your improvement needs in equal measure:
Commonly queried rows are stored together in partitions so that you can restrict data retrieval to those partitions that contain the rows required by your SQL statement. Other partitions that do not contain requested rows are not read.
Incidentally, other database systems, such as Netezza, go the other way round by storing the information about the data blocks where the requested rows are not stored. Netezza called this method “Data Skipping”.
Partitioning will not help to avoid table scans, but it will reduce the number of data blocks that each AMP must move to its local memory (FSG cache) because only relevant partitions need to be accessed.
How is Teradata Partitioning implemented?
The idea behind the Teradata Partitioned Primary Index is to keep the records of a table packed together in clearly delineated containers of which there are not too many in total, about the size of the table. In that sense, Partitioning is just a different way of structuring the data records on the disks.
In the case of partitioned tables (PPI Tables), the Primary Index still determines the responsible AMP of each row. Only the way the rows are stored in the data blocks is different: While not partitioned tables store their rows sorted by rowid only, partitioned tables (PPI tables) store rows at first inside the corresponding partitions and only afterward sorted by the rowid.
Whenever a data record is accessed, the cylinder index of the AMP is queried to find out on which cylinder the first data block of the accessed partition is located. After positioning at the first data block, all remaining data blocks of the partition (or subpartitions in the case of multilevel partitioning) can be fetched into the FSG cache by scanning in a sequence with a minimum number of disk IO's.
It is a remarkable technical detail that Teradata internally considers all tables to be partitioned. NPPI tables are nothing else than PPI tables with exactly one partition, namely partition number zero, containing all table rows. You should always collect statistics on the dummy column PARTITION, even for not partitioned tables, as this is used by the optimizer to estimate the table cardinality of NPPI tables.
The process of accessing chunks of data along the partitioning attributes is often called partition elimination. This convention might lead you to believe that an aforementioned excluding approach is in place when in fact it is more of a “partition picking” that takes place.
You may wonder in which way partitioning is different from indexing techniques on Teradata.
Maybe the most notable difference is that all Teradata index types consume permanent space. Any Teradata index is a sub-table redundantly storing a subset of columns of its related base table.
While we still experience an impact on the occupied permanent disk space, this is not related to the redundant storage of columns but caused by the overhead of storing the records in a partitioned way: PPI table rows are 2 to 8 bytes wider than the equivalent not partitioned table row. The extra bytes are used to store the internal partition number of each row.
Traditionally, one of the primary applications for a PPI is partitioning by a date. Most reports have a date dimension as part of their reporting filters. Consider a call detail table in telecommunication companies. These are huge tables holding detailed call information about each single call for a particular time frame. Analytics take place directly on this one big table, and often along with joins to other tables are not mandatory in providing the query result.
Imagine a call detail table containing a full year of data on a daily level, but you only need one day in your report result dataset. If you partition this table by date, each AMP can restrict access exactly to this one date i.e. only 1/365 of the rows have to be moved from the disks into the AMPs memory (FSG cache).
Partitioning should always be considered over the use of other index types if ranges of data records have to be accessed together. Typical workloads are aggregations (“sum up all account balances for my business segment” etc.) which you often find in reporting requirements.
On the other hand, certain indexing techniques that allow direct access to the data blocks (primary index, unique secondary index, join index with matching primary index definition, etc.) are usually right for OLTP applications (“Give me the customer name for the client where customer id = 123”).
As you can combine partitioned tables with other index types, you can improve usability even further.
Until Teradata Release 12, several restrictions existed if you wanted to partition a table, especially in the context of data types. There was no possibility to partition character columns and timestamps. At least there was a workaround for character columns:
PARTITION BY RANGE_N(
(HASHBUCKET(HASHROW( ))) MOD 5 BETWEEN 0 AND 4 EACH 1)
With Teradata Release 13.10, character and timestamp column partitioning is possible in a direct way.
Even the system variable CURRENT_DATE can be used in the partitioning statement for timestamps:
PARTITION BY RANGE_N(CALENDAR_DT BETWEEN CURRENT_DATE-365 AND CURRENT_DATE EACH INTERVAL ‘1' DAY );
Using CURRENT_DATE in your partition statement makes it easy to increase your partitions up to the most current date:
— Delete the rows not matching any partition
ALTER TABLE <TABLE> TO CURRENT WITH DELETE;
— Insert the rows not matching into a backup table
ALTER TABLE <TABLE> TO CURRENT WITH INSERT INTO <BACKUP_TABLE>;
There is three kind of partitioning supported: A simple partitioning by column, the CASE_N syntax, and the RANGE_N syntax.
As the names already indicate, RANGE_N allows you to create ranges of rows which will end up in the same partition and CASE_N allows you to use the simple CASE … WHEN statement.
Here is one example of an often used RANGE_N partitioning by date:
CREATE TABLE <TABLE>
PK INTEGER, MyDate DATE)
PRIMARY INDEX (MyDate)
PARTITION BY RANGE_N (MyDate BETWEEN DATE ‘2014-01-01' AND DATE ‘2020-12-31' EACH INTERVAL ‘1' DAY, NO RANGE, UNKNOWN);
As you can see from the above example, there exist two distinct partitions which you can add to each table:
All rows not matching any of the defined partitions will end up in the NO RANGE bucket. If you do not specify a UNKNOWN partition and try to insert records not matching any specified partition, you will receive an error message, and the insert will fail.
The UNKNOWN partition is used to hold rows with NULL values in the partition expression.
Here is one example for a CASE_N partitioning:
CREATE TABLE <TABLE>
PK INTEGER, MyValue INTEGER)
PRIMARY INDEX (MyDate)
PARTITION BY CASE_N (MyValue < 1000 MyValue < 2000, MyValue < 3000, NO RANGE, UNKNOWN);
Again, we can add the two special partitions as required.
CASE_N partitioning should be utilized if you want to group values together into partitions.
Partitioning is not restricted to one level. Multilevel Partitioning allows the partitioning of tables on more than one level. Here is one example:
CREATE TABLE <TABLE>
PK INTEGER, MyValue INTEGER)
PRIMARY INDEX (MyDate)
PARTITION BY (
CASE_N (MyValue < 1000 MyValue < 2000, MyValue < 3000, NO RANGE, UNKNOWN),
RANGE_N (MyDate BETWEEN DATE ‘2014-01-01' AND DATE ‘2020-12-31' EACH INTERVAL ‘1' DAY, NO RANGE, UNKNOWN));
Just keep in mind that some partitions are limited. Up to release 13.10 of Teradata, the maximum number of partitions was 65535. Pay attention to the fact that this is the number of combined partitions in the case of multi-level partitioning and don't forget to count the NO RANGE and UNKNOWN partitions as well. You have to multiply the number of partitions on each level to get the overall number of partitions in use. Starting with Teradata 14.10 it is possible to have more than 9.000 Quintillion partitions.
When does Teradata partition elimination take place?
Many misconceptions exist about when partition elimination takes place.
First, you have to distinguish between partition elimination and primary index access. Partition elimination occurs independently of any involvement of the Primary Index in a query.
Although the primary index choice plays a significant role in partition decisions, most of the time it is the elimination of partitions that accounts for the vast leap in IO reduction.
These are the possible performance patterns for partitioned primary index access. They are dependent on how PI and Partitions are referenced:
Furthermore, for multi-level partitioning, you do not have to include all partitions in the WHERE condition to be able to cut partitions. Each of the partitions can be addressed independently, which is indeed a very nice feature in Teradata.
Now for the limitations of the primary index choice in the context of partitioning.
Many of us have heard about a performance difference between tables including all partition columns in the primary index and tables that do not include the partition columns in the primary index. Also, if tables are not including the partitioning columns, the primary index cannot be unique. What's behind all this?
To understand the relation between primary index uniqueness and partitioning, we have to recall how rows are stored on the disks:
In the first step, the rows are hashed to the responsible AMPs. In the second phase, they are inserted into the proper partitions and sorted by the ROWID.
Imagine at first a table with a non-unique primary index: Many rows can have the same primary index and will hash to the same AMP. However, each of these rows could belong to a different partition.
In the case of a not partitioned tables, all rows with the same hash are stored together, in the event of a partitioned table the rows are scattered across different partitions.
Now imagine it was allowed to have a partitioned table with a unique primary index and without the partition column being part of the primary index. Any update or insert statement would need Teradata to check each partition to avoid the creation of duplicates. Having to check each partition is very inefficient from a performance point of view.
Also, keep in mind that if the primary index is not including the partitioning columns, each time a primary index access is required, the responsible AMP has to scan all its partitions for this particular primary index. The scan of all partitions can be avoided if you include the partition columns into the primary index. I consider this a minor problem, more related to OLTP.
Table partitioning has other impacts on performance as well. Especially when joining a non-partitioned and a partitioned table together, Teradata has to apply different join techniques for non-partitioned tables. A Sliding Window merge join is one of the join techniques related to partitioning. Another option is to “de-partition” the partitioned table and doing a regular merge join or to partition the table which is not partitioned.
Similar to the joining of two non-partitioned tables, from a performance point of view, it is best to have the same primary index on both tables, and a join is taking place on all primary index columns and additionally on the partition columns. Anything else requires less efficient join techniques.
Hence, while the elimination of partitions may significantly reduce disk IO's you have to keep an eye on its join performance impact.
Some more benefits arise when using partitioned tables: The records of complete partitions can be removed from the table with a simple statement and, more important, without involving the transaction log:
MODIFY DROP RANGE BETWEEN DATE ‘2014-01-01' AND DATE ‘2013-06-30' EACH INTERVAL ‘1' DAY;
Finally, here are the “best practice” statistic recommendations for partitioned tables:
Columns in any not partitioned table that are equated to the partitioning column of the partitioned table
The system-derived PARTITION dummy column of all partitioned tables
I hope I could convince you that I consider partition elimination an important tool in the toolkit for performance optimization.
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.
Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.