Archive

Category Archives for "Indexing"
1

All Rows on One AMP – By Design (Part 2)

All Rows on One AMP – By Design (Part 2)

In an earlier article (link to All Rows on One AMP – By Design) I showed an alternative design for small reference tables which ensures that all rows in the table are stored on a single AMP. The design allows for single-AMP retrieval of the entire table, usually before joining, which uses fewer AWTs.

During a recent Teradata education class that I was teaching a student commented that most if not all of their reference tables were too large to use that approach because they all contain history rows (effectively they are ‘slowly changing dimensions’). I was asked if my proposed design could be extended for slightly larger reference tables.

This article discusses four design options that I’ve investigated as a result of those discussions.

During this investigation and testing, I also started working on Teradata 16.10 which includes the “Multiple Hash Maps” (MHM) feature which provides some options for the designs included here.

Three Scenarios– Small, Medium and Large

During the investigation and testing of this requirement, I quickly realized that the design needs to cater for different sizes of reference table. For this article I’ve described these as ‘Small’, ‘Medium’ and ‘Large’, with the definitions of these terms shown below:

Small The entire table fits into 1 or 2 data blocks (typically 1 – 2 MB of PERM space).

I suspect that such tables will often not contain any history of changes, the table only holds ‘current rows’.

Medium All rows (both ‘current’ and ‘history’) fit into a few data blocks, probably less than one cylinder (@11MB).

The ‘current rows’ fit into 1 or 2 data blocks (typically 1 – 2 MB of PERM space).

Large The ‘current rows’ fit into 1 or 1 data blocks (typically 1 – 2 MB of PERM space) but the ‘history rows’ can grow as large as required (to meet business requirements).

What is the requirement?

This design is based on the following requirements which I have come across at some customer sites. They may not exactly match yours, but I hope are very similar.

  • The table either contains ‘current’ rows and optionally ‘history’ rows.
  • The table contains ‘start date’ and ‘end date’ columns which define when the data in the row is effective.
  • The current row is indicated by a specific value in the ’end date’ column. In a Teradata system this is often “NULL”, “3500-12-31” or “9999-12-31”.
  • Typical queries using this table only require ‘current’ rows.

What are the new design options?

With the requirements stated above my suggested design depends on two factors:

  1. Is the reference table ‘Small’, ‘Medium’ or ‘Large’.
  2. What version of the Teradata DBMS is being used? The important factor being “< TD 16.10” or “>= TD 16.10”.

Having determined where your table fits regarding the two factors shown above, use the following table to determine which design is best for you. The designs are explained in detail after this table and examples are given.

Teradata version < TD 16.10

Teradata version >= TD 16.10

Small

Add an extra column to the table definition which is the PI for the table. All rows have the same value for this column. (Design#1) Create the table with a ‘natural’ definition (UPI on the code column) and assign it to a 1-AMP Hash Map. (Design#2)

Medium

(See below – ‘Large table’) Create the table using a NUPI on the code column, add Row Partitioning using the ‘current row’ column (“End date” for this article) as the PPI column. Assign the table to a 1 AMP Hash Map.

Define the partitioning scheme so that all ‘current’ rows go into a single partition.

(Design#3)

Large

Create the table with a ‘natural’ definition (UPI on the code column).

Create a sparse Single Table Join Index (STJI) which has a WHERE clause that includes the ‘current’ row data value.

Note: It is critical that the WHERE clause is not an equality constraint against this ‘current’ row data value, but includes it.

Note: It is critical that the ‘current’ row data value is non-null.

(Design#4)

The following section shows examples of each of the four designs.

Design#1: Small table, < TD 16.10

The table definition is as shown in my previous article on this subject (link to All Rows on One AMP – By Design):

CREATE SET TABLE small_ref_table
(picol INTEGER NOT NULL
,ref_code CHAR(2) NOT NULL
,ref_desc VARCHAR(200))
PRIMARY INDEX(picol);

With the above definition in place, a typical query which joins to this table now includes (in the view definition) selection against the Primary Index column and so the relevant part of the plan becomes:

2) Next, we do a single-AMP RETRIEVE step from small_ref_table by way of
the primary index “small_ref_table.picol = 1” with no residual
conditions into Spool 2 (all_amps), which is duplicated on all
AMPs. The size of Spool 2 is estimated …

Design#2: Small table,>= TD 16.10

This table definition uses a ‘natural’ Primary Index but uses the TD16.10 MHM feature to ensure that all primary data rows are assigned to a single AMP.

Note: the actual map name (“TD_1AmpSparseMap_1Node” in the example below) may be different on your system).

CREATE SET TABLE small_ref_table
,MAP = TD_1AmpSparseMap_1Node
(ref_code CHAR(2) NOT NULL
,ref_desc VARCHAR(200))
UNIQUE PRIMARY INDEX(ref_code);

With the above definition in place, a typical query which joins to this table now includes (in the view definition) selection against the Primary Index column, and so the relevant part of the plan becomes:

4) We do a single-AMP RETRIEVE step from small_ref_table by way of an
all-rows scan with no residual conditions into Spool 3 (all_amps),
which is duplicated on all AMPs in TD_Map1. The size of Spool 3
is estimated …

Design#3: Medium table,>= TD 16.10

This table definition keeps the ‘code’ column as the Primary Index but adds in row partitioning on the ‘end_date’ column (the one used to identify the ’current’ row). The table now contains ‘history’ rows, and so the PI is defined as non-unique.

Note: the actual map name (“TD_1AmpSparseMap_1Node” in the example below) may be different on your system).

CREATE TABLE medium_ref_table
,MAP = TD_1AmpSparseMap_1Node
(ref_code CHAR(2) NOT NULL
,ref_desc VARCHAR(200)
,start_date DATE NOT NULL
,end_date DATE NOT NULL — uses 2099-12-31 as ‘current row indicator value’
PRIMARY INDEX(ref_code)
PARTITION BY RANGE_N(end_date BETWEEN DATE ‘2010-01-01’ AND DATE ‘2020-12-31’
EACH INTERVAL ‘1’ DAY,NO RANGE);

With the above definition in place, a typical query which joins to this table now includes (in the view definition) selection against the partitioning column so that only ‘current’ rows are returned, only requiring 1 partition to be accessed. The relevant part of the plan becomes:

4) We do a single-AMP RETRIEVE step from a single partition of
medium_ref_table with a condition of (“medium_ref_table.end_date
= DATE ‘2099-12-31′”) with a residual condition of
(“medium_ref_table.end_date = DATE ‘2099-12-31′”) into Spool 3
(all_amps), which is duplicated on all AMPs in TD_Map1. The size of
Spool 3 is estimated …

Note: a slight alternative to the above design is to use a NULL value in column ‘end_date’ to identify the ‘current’ row. If this is your requirement then change the design to include the special ‘UNKNOWN’ partition (instead of ‘NO RANGE’) and change the selection criteria to use “WHERE end_date IS NULL”.

Design#4: Large table (all TD releases), Medium table < TD 16.10

This design uses a sparse, single table join index (STJI) to give us single AMP access to the required data for ’current’ rows. The optimizer uses the STJI to partially cover the query.

CREATE TABLE large_ref_table
(ref_code CHAR(2) NOT NULL
,ref_desc VARCHAR(200)
,start_date DATE NOT NULL
,end_date DATE NOT NULL — uses 2099-12-31 as ‘current row indicator value’
UNIQUE PRIMARY INDEX(ref_code);

CREATE JOIN INDEX large_ref_table_stji
AS
SELECT end_date
,ref_code
,ref_desc
,start_date
FROM large_ref_table
WHERE end_date BETWEEN DATE ‘2099-12-30’ AND DATE ‘2099-12-31’
PRIMARY INDEX (end_date);

Note: It is critical that the WHERE clause in STJI definition is not an equality constraint against the ‘current’ row data value, but includes it.

Note: It is critical that the ‘current’ row data value is non-null. If you use NULL as the ’current’ row data value you will not get a single AMP retrieve from the STJI. This is because of a specific query re-write feature in Teradata.

With the above definition in place, a typical query which joins to this table now includes (in the view definition) selection against the ‘end_date’ column so that only ‘current’ rows are returned. The optimizer uses the STJI instead of the table, the relevant part of the plan becomes:

3) We do a single-AMP RETRIEVE step from LARGE_REF_TABLE_STJI
by way of the primary index “LARGE_REF_TABLE_STJI.end_date
= DATE ‘2099-12-31′” with no residual conditions into Spool 3
(all_amps), which is duplicated on all AMPs in TD_Map1. The size
of Spool 3 is estimated …

An additional benefit of this design which may prove useful to you is that your base table (“large_ref_table” in my example) can contain additional columns. For instance, you could have other columns which record:

  • The name of the user that initially created each row (CREATE_USERNAME).
  • The date and time when a row was created (CREATE_TS).
  • The name of the user that last updated each row (LASTALTER_USERNAME).
  • The date and time when a row last update (LASTALTER_TS).

If you include these columns in your base table design, do not include them in the STJI definition unless the majority of queries also need them (remember that the intent is to have all real data rows for the STJI fit into 1 or 2 data blocks).

When Can I (And When Can I Not) Use This Design?

Unlike the original design that I described (‘Design#1’ above), I think this approach can be used against any size table reference. You are no longer constrained by overall table size, only the size of the ‘current’ rows.

Here are some guidelines to help you when thinking about how and where to use the designs from this article:

  • Add the appropriate WHERE clause into the base view that references the table.
  • The aim is to reduce the resources used for processing; this will not noticeably change performance although you may find that a frequently referenced table (or at least the ‘current’ rows) will simply sit in memory on the Node.
  • If you decide to use Design#1 for some tables, do not use the same data value for column PICOL on all tables. Doing so would potentially overload one AMP. Use different values (perhaps the table name?) to ensure that the table rows are assigned to different AMPs.
  • If you decide to use Design#2 and Design#3 for some tables, do not use the same value for the COLOCATE USING clause on all tables. Doing so would potentially overload one AMP. I suggest that you do not code the COLOCATE USING clause to ensure that the table rows are assigned to different AMPs.

Summary

As before, this design approach is NOT going to make SQL requests run noticeably faster. However, it will make them run more efficiently because they use fewer AWT’s, CPU cycles and IO operations to achieve the same result.

The Costs of Decomposable Columns

What are Decomposable Columns?

Decomposable columns can be split into smaller units. Decomposable columns mean that the physical data model is not normalized.

If we don’t use a decomposable column for data retrieval, there is no negative impact on performance.

Still, if the column is used for data access, it should be decomposed into several columns which can be utilized independently for indexed access. You will get increased performance because there will be fewer full table scans.

Additionally, if the columns created from a decomposable column are NUSI columns, the possibilities for NUSI bitmapping are increased:

Here is an example:

— Select all Private Gold Products being sold by our company

SELECT * FROM Transaction WHERE ProductCode LIKE ‘PRIVGOLD%’;

— Select all Private Products being sold by our company

SELECT * FROM Transaction WHERE ProductCode LIKE ‘PRIV%’;

In both cases typically a full table scan is required to access the data.

By decomposing the column ProductCode, we give the Optimizer a chance to use existing NUSIs:

CREATE INDEX COLUMN(ProductSegment) ON Transaction;
CREATE INDEX COLUMN(ProductLevel) ON Transaction;

SELECT * FROM Transaction WHERE ProductSegment = ‘PRIV’ AND ProductLevel = ‘Gold’;

Often we can’t change the physical data model, and we have to find workarounds to avoid the negative performance impact of decomposable columns. Here is a simple but efficient solution:

SELECT p.ProductDescription,t.ProductCode
FROM
Transaction t
INNER JOIN
Product p
ON
t.ProductCode = p.ProductCode
WHERE
p.ProductCode LIKE ‘%Gold%’
;

Table Product is just a lookup containing all distinct ProductCodes. The Optimizer will probably do a full table scan on the Product Table with a subsequent merge join to the Transaction table. As the Product table is small, the cost of full table scan is negligible.

Still, the first choice is to avoid the implementation of decomposable columns whenever possible.

Questions?
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.

8 Teradata Data Access Paths Explained

The Teradata Data Access Paths

Access Path
Advantages
Disadvantages
Full Table Scan
Touches each row only once and provides access for any combination of columns in the WHERE condition predicate. A good choice if all rows need to be touched.
It reads each data block and usually requires a spool table as large as the base table
Unique Primary Index (UPI)
It is the cheapest access path and used when the SQL statement contains the primary index columns in the WHERE condition predicate. Only requires one AMP. No spool file is created. Allows to use row hash locking, the most granular locking method available
None
Nonunique Primary Index (NUPI)
Provides efficient access and involves just one AMP. It allows rowhash locking, may or may not require a spool file, depending on thenumber of rows returned.
The NUPI may have bad INSERT performance for a SET tables (duplicate row check).
Unique Secondary Index (USI)
Provides an efficient access path if the request WHERE condition predicates contain the USI columns. Typically involves only two AMPs. No spool table is created
Causes additional overhead for INSERT, UPDATE, MERGE, and DELETE statements
Nonunique Secondary Index(NUSI)
Provides efficient access when the number of distinct values is low, although it requires all AMPS and may create a spool table
Causes overhead for INSERT, UPDATE, MERGE, and DELETE statements. It will not be used by the Optimizer if the number of data blocks accessed is high, as it’s an All-AMP acess and a full table scan often is cheaper
Single-Table Join Index (JI) and Hash Index
Ideal when restricting the index columns to often used columns as it can reduce the number of IOs. Can have a different PI than the base table
Causes overhead for INSERT, UPDATE, MERGE, and DELETE statements
Multitable Join Index (JI)
Eliminates the need to perform joins and aggregates, and may be able to satisfy a query without referencing the base table. It can have a different PI than the base table and can replace an NUSI or a USI
Causes overhead for INSERT, UPDATE, MERGE, and DELETE statements for any of the base tables being part of the Multitable Join Index. Bad choice for tables with a large number of changes.
Sparse join index (JI)
Limits the number of rows being stored compared with a normal join index.  This limits the overhead for INSERT, UPDATE, MERGE, and DELETE statements. If restricted to highly selective column values, it can be used by the Optimizer to use the join index to access less common values
Causes overhead for INSERT, UPDATE, MERGE, and DELETE statements

See also:
The Teradata Join Index Guide – We leave no Questions unanswered!

Teradata Value Ordering for Join Index & NUSI

On the internet, there are several articles available, about Teradata Hashing and about how physical data rows are evenly distributed across all AMPs and sorted by ROWID.

There is almost no information about value ordering available. The only on-hand information I know is the official Teradata documentation giving a vague statement like “value order is a way of allowing for optimized range scans of values.” The majority of the technical descriptions terminate this topic with an example, showing how to use WHERE  conditions on a range of dates. Obviously, this is not enough for a firm understanding of this subject.

I think it is time to shed some light on value ordering.

In principle, we have two possibilities to use value order:

Historically only the non-unique secondary index allowed for value order. Starting with the release of Teradata 12, value ordering for join indexes was introduced.

How Teradata Value Ordering is implemented

Those of you which were already reading our physical design posts will know how data retrieval works on a Teradata system:

  1. The Parsing Engine calculates a hash value (the ROWHASH) over the Primary Index column(s) of the rows.
  2. The so-called hashmap is consulted to identify the responsible AMP for each row
  3. The AMP checks its Master Index to determine the cylinder carrying the row(s)
  4. The AMP checks the Cylinder Index to determine the disk sector(s) taking the row(s) (a sequence of sectors makes up a data block)

Usually, in the case of Primary Index hashing, the Master Index and the Cylinder Index carry information about where each searched ROWHASH can be found (i.e., on which cylinder and in which sectors the rows are stored). Furthermore, row pointers within a data block are maintained, sorted by ROWID.

Value ordering means, that Master Index and Cylinder Index store information about where a 32-bit value (not the ROWHASH) is located. Furthermore, the data block contains a row pointer array sorted by this 32-bit value(s).

Value ordering is exclusively allowed for 32-bit integers (which includes the “date” data type, as it is stored internally as a 32-bit integer). The reason is that the physical structures (row pointer array, Master Index dictionary, Cylinder Index dictionary) which are used by the fundamental hashing technique have been reused without any design changes.

Let me make some important remarks here: Value ordering is always an All-AMP operation. Although the Join Index allows us to define any Primary Index, it is only used for data distribution in this case. The second option, the NUSI, anyway requires always a full table scan by design.

Value ordering of a Join Index gives the same advantages one would have from a single partition PPI table but with less space usage overhead.

Value order index access is always an All-AMP access with a subsequent search of a 32-bit value.

I hope it is clear now, why the strength of the value ordering method lies in the “order” of the 32-bit integer column in each of the physical structures (Master Index, Cylinder Index, Data Block).

More on indexing here: Teradata Indexing

1

A great Performance Trick for the Teradata PPI

 Get a Grip on clumpy Row Distribution – the Teradata PPI Table

For this trick to function properly, the partition column(s) must not be part of the Teradata Primary Index.

At some point in time, we all experienced this problem with table inserts: Although the table rows are distributed evenly across all AMPs, the merge step into the table seems to take forever.

One very likely reason is that we are confronted with the scenario of a SET table and many identical NUPI values. The AMPs are forced to do a duplicate row check for each inserted row.  This process can take an enormous amount of time and waste a lot of CPU cycles as each new row is compared in full against all the existing rows having the same Primary Index value.

While this may be solved by switching over to a MULTISET design in the case of nonpartitioned tables, we have another great opportunity to address this problem with Teradata partitioned tables (PPI tables).

Assume we have the following table design like below:

CREATE TABLE MultiLevel_Example
(

  the_code INTEGER NOT NULL,
  the_date DATE  NOT NULL,

  the_value DECIMAL(18,0)
) UNIQUE PRIMARY INDEX (the_code)
PARTITION BY (
RANGE_N(the_date BETWEEN DATE ‘2015-01-01‘ AND DATE ‘2015-12-31’ EACH INTERVAL ‘1’ MONTH), RANGE_N(the_code BETWEEN 1 AND 5 EACH 1)
);

As you can see, the partition columns are not part of the Primary Index – this is the prerequisite for the trick to work!

Below is the schematic draw of the internal presentation of the table:

teradata ppi

The column labeled “C” is the combined partition, “L1” the first level of partitioning and “L2” the second tier. The column labeled “H” represents the ROWHASH of “The_Code.”

“Whenever we create a PPI table which does not contain all partitioning columns in the Primary Index definition, equal NUPI values are spread across different partitions, reducing the clumps and therefore improving performance as the number of rows to be compared for each duplicate row check becomes significantly smaller.”

Nevertheless, be aware that having the partition columns not in the Primary Index comes as well with a penalty and you have to evaluate advantages and disadvantages:

In the case of a primary index access, all partitions have to be checked to find the searched Primary Index value. Furthermore, merge joins may be executed slower and aggregations are more costly as they have to take place across partitions.

Let us take a short look why this trick will not work for a Teradata PPI table in case that all partition columns are contained in the Primary Index definition:

CREATE TABLE MultiLevel_Example

the_code
INTEGER NOT NULL,
the_date DATE  NOT NULL,

  the_value DECIMAL(18,0)

) UNIQUE PRIMARY INDEX (the_code,the_date)
PARTITION BY (
RANGE_N(the_date BETWEEN DATE ‘2015-01-01‘ AND DATE ‘2015-12-31’ EACH INTERVAL ‘1’ MONTH), RANGE_N(the_code BETWEEN 1 AND 5 EACH 1)
);

Take a look at below illustration. As The ROWHASH is calculated over all partition columns, it is by design impossible that the same ROWHASH ends up in different partitions (take your time to prove this):

teradata ppi

That’s all. I hope you enjoyed it. Feel free to ask any questions and let me know if something is unclear.

2

Watch the Teradata Indexing Video Course

teradata indexing

This video course will teach you all about the Teradata indexing techniques.

By taking this two-hour journey, you will learn all about the Teradata Architecture, Full Table Scans, Primary Index Access (UPI, NUPI), Secondary Index Access (USI, NUSI) and Join Indexes.

You will get to know all the details, which are necessary for making a perfect indexing landscape in your data warehouse environment.

>