Home Blog

The Teradata Access Paths

0

The Teradata OVERLAPS Command

0
Teradata OVERLAPS

With historically managed tables, it is often necessary to find out whether two time periods overlap. Usually, several comparisons are necessary.

The Teradata OVERLAPS command makes it easier.

Here is an example:

SELECT ‘Overlapping’ WHERE (DATE ‘2019-01-01’, DATE ‘2019-06-30’) OVERLAPS(DATE ‘2019-05-31’, DATE ‘2019-12-31’);

-> Result: ‘Overlapping

Caution is advised when the end of an interval is equal to the start of the second interval. The OVERLAPS command does not consider this to be an overlap:

SELECT ‘Overlapping’ WHERE (DATE ‘2019-01-01’, DATE ‘2019-06-30’) OVERLAPS(DATE ‘2019-06-30’, DATE ‘2019-12-31’);

-> No rows returned

If a date is part of the consideration, at least 2 days must be overlapping. So the following query recognizes the overlap as expected:

SELECT ‘Overlapping’ WHERE (DATE ‘2019-01-01’, DATE ‘2019-06-30’) OVERLAPS(DATE ‘2019-06-29’, DATE ‘2019-12-31’);

-> Result: ‘Overlapping

Analogous, for example, to 2 intervals with date, it also applies to intervals with time that the overlap must be at least 2 seconds:

Thus, the following query does not return an overlap:

SELECT ‘Overlapping’ (TITLE ‘ ‘ ‘)WHERE(TIME ’03:00:00′, TIME ’05:00:00′) OVERLAPS(TIME ’05:00:00′, TIME ’07:00:00’) ;

-> No rows returned

At least 2 seconds of overlap are necessary to detect such an overlap:

SELECT ‘Overlapping’ (TITLE ‘ ‘ ‘)WHERE(TIME ’03:00:00′, TIME ’05:00:00′) OVERLAPS(TIME ’04:59:59′, TIME ’07:00:00’) ;

-> Result: ‘Overlapping

Of course the OVERLAPS command can also be used for TIMESTAMPS:

SELECT ‘Overlapping’ (TITLE ‘ ‘)WHERE(TIMESTAMP ‘2019-01-01 03:00:00’, TIMESTAMP ‘2019-06-30 05:00:00’) OVERLAPS(TIMESTAMP ‘2019-06-30 05:00:00’, TIMESTAMP ‘2019-12-31 07:00:00’) ;

-> No rows returned

SELECT ‘Overlapping’ (TITLE ‘ ‘)WHERE(TIMESTAMP ‘2019-01-01 03:00:00’, TIMESTAMP ‘2019-06-30 05:00:00’) OVERLAPS(TIMESTAMP ‘2019-06-30 04:59:59’, TIMESTAMP ‘2019-12-31 07:00:00’) ;

-> Result: ‘Overlapping

As you can see, the OVERLAPS command provides a simple syntax to detect overlapping intervals. Only the peculiarity that at least 2 days (date) or 2 seconds (time) must be difference to recognize an overlap must be considered if necessary.

Finally, there is a special case, which may lead to an unexpected result for you:

SELECT ‘Overlapping’ (TITLE ‘ ‘ ‘)WHERE(TIME ’05:00:00′, TIME ’03:00:00′) OVERLAPS(TIME ’03:00:01′, TIME ’04:00:00’) ;

What do you think is being delivered back here? Leave a comment with your solution!

Troubleshooting the Teradata LIKE Operator

2
Teradata LIKE

Have you ever noticed that the Teradata LIKE operator behaves differently depending on whether you apply it to a column of data type CHAR or VARCHAR? Let us take the table below as an example:

CREATE TABLE TheLikeTest
(
TheVarchar VARCHAR(20),
TheChar CHAR(20)
);


We’re inserting exactly one row for our demonstration:

INSERT INTO TheLikeTest VALUES (‘Anytext’,’Anytext’);

First, we run the following SQL statement for the column of datatype VARCHAR:

SELECT TheVarchar FROM TheLikeTest WHERE TheVarchar LIKE ‘%t’;
-> Result: ‘Anytext’ (1 Row returned)

Next, we execute the same query on the column of data type CHAR:
SELECT TheChar FROM TheLikeTest WHERE TheChar LIKE ‘%t’;
-> Result: No rows returned

As we see, the second query does not return a result. How is that possible?

The solution to this puzzle is simple: Since it is a column of the data type CHAR, space characters are padded to 20 characters. Our query expects a ‘t’ at the end of the column content, but there is certainly a space character there!

How can this undesired behaviour be avoided? Leave a comment with your solution!

Teradata Advanced Grouping Options

2
Teradata advanced grouping

Teradata offers three advanced GROUPING options in addition to the usually used GROUP BY statement:

– GROUP BY GROUPING SETS
– GROUP BY ROLLUP
– GROUP BY CUBE

As an example we take a table containing the following information:

Flights per plane, year and month.

CREATE MULTISET TABLE Flights
(
  PLANE AS BIGINT NOT NULL,
  FLIGHTDATE AS DATE NOT NULL,
  NR_FLIGHTS INTEGER NOT NULL
) PRIMARY INDEX (PLANE);

PlaneFlightDateFlights
12019-02-265
22019-02-262
32019-02-263
12019-02-271
22019-02-272
32019-02-275
12019-02-285
22019-02-281
32019-02-281
12019-03-015
22019-03-013
32019-03-012
12019-03-021
22019-03-021
32019-03-021

GROUP BY GROUPING SETS

SELECT
PLANE,
EXTRACT(MONTH FROM FLIGHTDATE) AS TheMonth,
EXTRACT(YEAR FROM FLIGHTDATE) AS TheYear,
SUM(NR_FLIGHTS) AS NR_FLIGHTS
FROM
FLIGHTS
GROUP BY GROUPING SETS (PLANE,TheMonth,TheYear)
ORDER BY 1 DESC,2 DESC,3 DESC;

PlaneTheMonthTheYearNR_FLIGHTS
3NULLNULL12
2NULLNULL9
1NULLNULL12
NULL3NULL13
NULL2NULL25
NULLNULL201938

GROUP BY GROUPING SETS calculates the number of flights for each plane, each month, each year.

GROUP BY ROLLUP

SELECT
PLANE,
EXTRACT(MONTH FROM FLIGHTDATE) AS TheMonth,
EXTRACT(YEAR FROM FLIGHTDATE) AS TheYear,
SUM(NR_FLIGHTS) AS NR_FLIGHTS
FROM
FLIGHTS
GROUP BY ROLLUP (Plane,TheMonth,TheYear)
ORDER BY 1 DESC,2 DESC,3 DESC;

PlaneTheMonthTheYearNR_FLIGHTS
3320193
33NULL3
3220199
32NULL9
3NULLNULL12
2320194
23NULL4
2220195
22NULL5
2NULLNULL9
1320196
13NULL6
12201911
12NULL11
1NULLNULL17
NULLNULLNULL38

GROUP BY ROLLUP calculates the number of flights for each plane, each distinct month, each month per year, each year plus the grand total.

GROUP BY CUBE

SELECT
PLANE,
EXTRACT(MONTH FROM FLIGHTDATE) AS TheMonth,
EXTRACT(YEAR FROM FLIGHTDATE) AS TheYear,
SUM(NR_FLIGHTS) AS NR_FLIGHTS
FROM
FLIGHTS
GROUP BY CUBE (Plane,TheMonth,TheYear)
ORDER BY 1 DESC,2 DESC,3 DESC;

PlaneTheMonthTheYearNR_FLIGHTS
3320193
33NULL3
3220199
32NULL9
3NULL201912
3NULLNULL12
2320194
23NULL4
2220195
22NULL5
2NULL20199
2NULLNULL9
1320196
13NULL6
12201911
12NULL11
1NULL201917
1NULLNULL17
NULL3201913
NULL3NULL13
NULL2201925
NULL2NULL25
NULLNULL201938
NULLNULLNULL38

Teradata advanced grouping options help to achieve grouping on different levels at the same time, thus increasing performance

Joins as a Main Target for Teradata Tuning

1
Teradata UNION ALL

Joining tables is one of the most expensive operations of an SQL statement. This is because each join requires the rows to be joined on a common AMP.

If the two tables do not have the same primary index, it is necessary to copy the rows of one or both tables. This can cause a huge number of IOs in large tables.

For each join, the Teradata Optimizer can choose between different types of joins, each suitable for a specific initial scenario. The main goal of the Teradata Optimizer is to minimize resource consumption (IOs, CPU consumption, etc.) for the entire execution plan.

General Join Optimization Considerations

The optimizer makes use of available statistics in order to determine the optimal join plan.

The optimizer‘s cost estimations for each join are based on the following factors: The expected cardinality of the output spool (after the join) and the expected cardinality of the input spools after WHERE conditions have been applied.

The selected column lists of each input spool and the average row size. Usually, only the selected columns of each input spool are taken over into the resulting spool.

The only exception is if a permanent table is joined directly without spooling.

In such a case all columns of this table are taken over into the resulting spool.

Based on the above observations we can draw the following conclusions regarding performance:

The important statistics for each join are the ones on the join columns and on any available WHERE condition.
Whenever possible we should apply WHERE conditions in our queries in order to reduce the cardinality of the input tables.


further, we should only select the columns required in the result set and avoid „SELECT * FROM“ queries.

Teradata Space Limit Features

0

With the Teradata Version 16.00 some interesting features in the area of Space Management have been introduced, which can bring enormous improvement in the daily activities in the area of administration and loading of the data warehouse.

In my current environment, it regularly happens that the daily loading of the data warehouse is interrupted because there is not enough permanent space available in the databases.

As we all know, the maximum available space is defined when a database is created. Here is the syntax to create a database with 80 GB permspace:

CREATE DATABASE Customer AS PERM = 800000000000;

The entire available space is then evenly allocated to all AMPs. For simplicity’s sake, we assume 8 AMPs. In this case each AMP has 10GB of permanent space available.

Limits for Skewed Tables

This even distribution of the available PermSpace to all AMPs is based on the assumption that all rows of all tables are distributed evenly using the primary index. But what happens if this is not the case?

Let’s assume that we want to load a table with 11GB of data, and the designer of the table structure has chosen the Primary Index so clumsy that all rows are assigned to exactly one AMP. Since the space limit on each AMP is 10GB,
we will receive the following error message:

2644: No more room in database Customer.

This means that although the remaining 7 AMPs each have 10GB available, the table cannot be loaded.

The maximum available space in a database is shown in table DBC.DATABASESPACE in the MaxPermSpace column. The current occupied space can be seen in Column CurrentPermSpace. In CurrentPermSpace, however, only the entire occupied space of the database is displayed, not the occupied space per AMP.

I’ve often seen surprised faces of developers who were surprised that when there was enough space (MaxPermSpace – CurrentPermSpace) the error message “2644: No more room in database Customer” appeared.

In order to reduce or completely prevent such problems, Teradata 16.00 offers the possibility to globally define the available space.

Therefore the new DBC Table GlobalDBSpace, as well as an extended syntax for creating a database was introduced:

CREATE DATABASE Customer AS PERM = 800000000000 SKEW = 10 PERCENT;

The 10 PERCENT Skew given when creating the database mean the following:

The total PermSpace available is still 80GB. Nothing’s gonna change that. However, each of the 8 available AMPs is allowed to hold up to 11GB of data if it does not exceed the absolute upper limit of 80GB of the database.

In our example from above, in which the clumsy selection of the primary index would insert all rows on exactly one AMP, the error “2644 No more room in database Customer” would no longer occur. However, such a situation is recorded in the Software Event Log.

While the feature described above simplifies the handling of skew tables, another feature was introduced with Teradata 16.00, which has the purpose to prevent loads from crashing if not enough database space has been defined:

Global Space Soft Limit.

This is defined via DBSControl GlobalSPaceSoftLimit and specified as a percentage.

In principle, a percentage is defined here above which the defined PermSpace may be exceeded without the error “2644 No more room in database Customer” occurring.

For example, if the database is defined on an 8 AMP system with 80GB PermSpace, and a soft limit of 10%, up to 11GB PermSpace can be used per AMP.

Again, an event is triggered and written to the Software Event Log. This allows database administrators to analyze the situation afterwards and take appropriate action.

Subscribe to our Newsletter

Join 4500+ subscribers receiving our free newsletter.
No spam, ever. Just great stuff.

In order to activate your subscription, check your email and click on the link!

x
Analytics by GoSquared