Teradata SQL and Advanced GROUPING Functions
What advanced GROUPING methods are there in Teradata SQL?
– GROUP BY GROUPING SETS
– GROUP BY ROLLUP
– GROUP BY CUBE
What is GROUP BY CUBE used for?
One or more columns are used to group across multiple dimensions.
If grouped by n columns, the result is 2 to the power of n rows, as shown in the graph:
What is GROUP BY ROLLUP used for?
This functionality is used to group columns over a single dimension calculating multiple levels of detail.
What is GROUP BY GROUPING SET used for?
Groupes over one dimension without ROLLUP or several dimensions without CUBE
What is a GROUPING SET?
A set of columns over which we group.
What is an empty GROUPING SET?
It is represented by a left and right bracket “()” and returns a grand total
We will explain the advanced GROUPING functions using the following example. The basis for this is a table that contains the number of flights per aircraft and date:
CREATE MULTISET TABLE Flights ( PLANE AS BIGINT NOT NULL, FLIGHTDATE AS DATE NOT NULL, NR_FLIGHTS INTEGER NOT NULL ) PRIMARY INDEX (PLANE);
Below you can see our test data:
Plane | FlightDate | Flights |
1 | 2019-02-26 | 5 |
2 | 2019-02-26 | 2 |
3 | 2019-02-26 | 3 |
1 | 2019-02-27 | 1 |
2 | 2019-02-27 | 2 |
3 | 2019-02-27 | 5 |
1 | 2019-02-28 | 5 |
2 | 2019-02-28 | 1 |
3 | 2019-02-28 | 1 |
1 | 2019-03-01 | 5 |
2 | 2019-03-01 | 3 |
3 | 2019-03-01 | 2 |
1 | 2019-03-02 | 1 |
2 | 2019-03-02 | 1 |
3 | 2019-03-02 | 1 |
Teradata SQL – GROUP BY GROUPING SETS
Groups data either on one dimension without ROLLUP or on several dimensions one CUBE. Here is an example:
SELECT PLANE, EXTRACT(MONTH FROM FLIGHTDATE) AS TheMonth, EXTRACT(YEAR FROM FLIGHTDATE) AS TheYear, SUM(NR_FLIGHTS) AS NR_FLIGHTS FROM FLIGHTS GROUP BY GROUPING SETS (PLANE,TheMonth,TheYear) ORDER BY 1 DESC,2 DESC,3 DESC;
Plane | TheMonth | TheYear | NR_FLIGHTS |
3 | NULL | NULL | 12 |
2 | NULL | NULL | 9 |
1 | NULL | NULL | 12 |
NULL | 3 | NULL | 13 |
NULL | 2 | NULL | 25 |
NULL | NULL | 2019 | 38 |
In our example, GROUP BY GROUPING SETS calculates the number of flights for each plane, each month, each year.
Details about GROUP BY GROUPING SET
- All columns of GROUPING sets must reference column names. The referencing cannot be done using the position as it is possible in simple GROUP BY (GROUP BY 1,2).
- Each GROUPING SET may contain a maximum of 190 columns.
- Large objects cannot be used in a GROUPING SET (CLOB, BLOB). They must first be cast into another data type. CLOB can be cast to CHAR or VARCHAR. BLOB can be cast to BYTE or VARBYTE.
Teradata SQL – GROUP BY ROLLUP
Groups data on a single dimension with more than one detail level. Here is an example:
SELECT PLANE, EXTRACT(MONTH FROM FLIGHTDATE) AS TheMonth, EXTRACT(YEAR FROM FLIGHTDATE) AS TheYear, SUM(NR_FLIGHTS) AS NR_FLIGHTS FROM FLIGHTS GROUP BY ROLLUP (Plane,TheMonth,TheYear) ORDER BY 1 DESC,2 DESC,3 DESC;
Plane | TheMonth | TheYear | NR_FLIGHTS |
3 | 3 | 2019 | 3 |
3 | 3 | NULL | 3 |
3 | 2 | 2019 | 9 |
3 | 2 | NULL | 9 |
3 | NULL | NULL | 12 |
2 | 3 | 2019 | 4 |
2 | 3 | NULL | 4 |
2 | 2 | 2019 | 5 |
2 | 2 | NULL | 5 |
2 | NULL | NULL | 9 |
1 | 3 | 2019 | 6 |
1 | 3 | NULL | 6 |
1 | 2 | 2019 | 11 |
1 | 2 | NULL | 11 |
1 | NULL | NULL | 17 |
NULL | NULL | NULL | 38 |
GROUP BY ROLLUP calculates the number of flights for each plane, each distinct month, each month per year, each year plus the total.
Details about GROUP BY ROLLUP
- Several GROUPING SETS can be defined in one SQL. In our example, we use only one (Plane, TheMonth, TheYear)
- Large objects cannot be used in a GROUPING SET (CLOB, BLOB). They must first be cast into another data type. CLOB can be cast to CHAR or VARCHAR. BLOB can be cast to BYTE or VARBYTE.
- All columns of GROUPING sets must reference column names. The referencing cannot be done using the position as in simple GROUP BY (GROUP BY 1,2).
- GROUP BY ROLLUP on a single column corresponds to a simple GROUP BY statement
- Each GROUPING SET may contain a maximum of 190 columns.
Teradata SQL – GROUP BY CUBE
Groups the data in several dimensions. Here is an example:
SELECT PLANE, EXTRACT(MONTH FROM FLIGHTDATE) AS TheMonth, EXTRACT(YEAR FROM FLIGHTDATE) AS TheYear, SUM(NR_FLIGHTS) AS NR_FLIGHTS FROM FLIGHTS GROUP BY CUBE (Plane,TheMonth,TheYear) ORDER BY 1 DESC,2 DESC,3 DESC;
Plane | TheMonth | TheYear | NR_FLIGHTS |
3 | 3 | 2019 | 3 |
3 | 3 | NULL | 3 |
3 | 2 | 2019 | 9 |
3 | 2 | NULL | 9 |
3 | NULL | 2019 | 12 |
3 | NULL | NULL | 12 |
2 | 3 | 2019 | 4 |
2 | 3 | NULL | 4 |
2 | 2 | 2019 | 5 |
2 | 2 | NULL | 5 |
2 | NULL | 2019 | 9 |
2 | NULL | NULL | 9 |
1 | 3 | 2019 | 6 |
1 | 3 | NULL | 6 |
1 | 2 | 2019 | 11 |
1 | 2 | NULL | 11 |
1 | NULL | 2019 | 17 |
1 | NULL | NULL | 17 |
NULL | 3 | 2019 | 13 |
NULL | 3 | NULL | 13 |
NULL | 2 | 2019 | 25 |
NULL | 2 | NULL | 25 |
NULL | NULL | 2019 | 38 |
NULL | NULL | NULL | 38 |
Details about GROUP BY CUBE
- Several GROUPING SETS can be defined in one Teradata SQL. In our example, we use only one (Plane, TheMonth, TheYear)
- All columns of GROUPING sets must reference column names. The referencing cannot be done using the position as in simple GROUP BY (GROUP BY 1,3).
- Large objects cannot be used in a GROUPING SET (CLOB, BLOB). They must first be cast into another data type. CLOB can be cast to CHAR or VARCHAR. BLOB can be cast to BYTE or VARBYTE.
- Each GROUPING SET may contain a maximum of 8 columns
Teradata advanced grouping options help to achieve grouping on different levels at the same time, thus increasing performance