fbpx

Efficient use of Teradata Recursive Queries

By Roland Wenzlofsky

April 29, 2018


Teradata Recursive Queries

SQL offers two methods to create a Teradata recursive query. We can either create a query using the WITH RECURSIVE clause or create a view using the CREATE RECURSIVE VIEW statement.

Each recursive query consists of 2 or 3 parts, depending on which of the above approaches is used:

  • The seed statement is the initial query that runs.
  • The recursive statement is the repeating query, which is executed until no further rows are returned. To avoid infinite loops, we usually add a termination condition to the recurrent query. Without a termination condition, the user may run out of spool space.
  • The final query returns the result of the seed query and all iterations of the recursive query.

Creating a recursive query using the WITH RECURSIVE clause:

WITH RECURSIVE <query>
(<columns>) AS
(
<seed query>
UNION ALL
<recursive query>
)
<final query>;

Creating a recursive query using a RECURSIVE VIEW:

CREATE RECURSIVE VIEW <view>
(<columns>) AS
(
<seed query>
UNION ALL
<recursive query>
);

To get a better understanding of how recursive queries work on Teradata, I prepared a common problem we want to solve: Finding the shortest paths in a graph.

Shortest Path – The Recursive Solution

We will store all available legs in the below table:

CREATE SET TABLE Leg
(
From_Id INTEGER NOT NULL,
To_Id INTEGER NOT NULL
)
PRIMARY INDEX (From_Id);

/* Example Graph */

INSERT into Leg VALUES (‘1′,’2’);
INSERT into Leg VALUES (‘2′,’1’);
INSERT into Leg VALUES (‘2′,’3’);
INSERT into Leg VALUES (‘2′,’7’);
INSERT into Leg VALUES (‘3′,’2’);
INSERT into Leg VALUES (‘3′,’4’);
INSERT into Leg VALUES (‘3′,’5’);
INSERT into Leg VALUES (‘3′,’6’);
INSERT into Leg VALUES (‘6′,’7’);
INSERT into Leg VALUES (‘6′,’3’);
INSERT into Leg VALUES (‘7′,’2’);
INSERT into Leg VALUES (‘7′,’6’);
INSERT into Leg VALUES (‘7′,’8’);
INSERT into Leg VALUES (‘7′,’9’);
INSERT into Leg VALUES (‘8′,’7’);
INSERT into Leg VALUES (‘9′,’7’);
INSERT into Leg VALUES (‘5′,’3’);
INSERT into Leg VALUES (‘4′,’3’);

The solution with a recursive query:

WITH RECURSIVE ThePath
(From_Id, To_Id, Path, TheLength) AS
(
SELECT
From_Id
,To_Id
,(TRIM(From_Id) || ‘,’ || TRIM(To_Id)) (VARCHAR(512)) AS Path
, 1 AS TheLength
FROM
Leg
WHERE
From_Id = ‘1’
UNION ALL
SELECT
ThePath.From_Id
,t01.To_Id
,ThePath.Path || ‘,’ || TRIM(t01.To_Id)
,ThePath.TheLength + 1 AS TheLength
FROM
Leg t01
INNER JOIN
ThePath
ON
t01.From_Id = ThePath.To_Id
WHERE POSITION(‘,’ || TRIM(t01.To_Id) || ‘,’ IN ‘,’ || ThePath.Path || ‘,’) = 0
— Above WHERE condition ensures that we do not revisit a node a second time!
AND ThePath.TheLength <= 100
— Avoid out of spool situations, put a fixed stop after 100 recursions!
)

/* Below statement ensures that if there are multiple routes between two nodes, one of
the minimum numbers of stops are chosen */

SELECT
From_Id,
To_Id,
Path,
TheLength
FROM ThePath
QUALIFY ROW_NUMBER() OVER (PARTITION BY From_Id, To_Id ORDER BY TheLength, Path) = 1
ORDER BY 1,4,3;

Here is the result set of the query, showing all minimum distance routes from node 1

FROM_IDTO_IDPATHTheLength
121,21
131,2,32
171,2,72
141,2,3,43
151,2,3,53
161,2,3,63
181,2,7,83
191,2,7,93

Shortest Path – The Non-Recursive Solution

We will now compare the recursive query with a solution written as Stored Procedure:

CREATE SET TABLE Route
(
From_Id INTEGER NOT NULL,
To_Id INTEGER NOT NULL,
Path VARCHAR(512),
TheLength INTEGER
) PRIMARY INDEX (From_Id);

REPLACE PROCEDURE ThePath()
DYNAMIC RESULT SETS
BEGIN
DELETE FROM ShortestPath;
INSERT INTO ShortestPath
SELECT
From_Id,
To_Id,
(TRIM(From_Hub_Id) || ‘,’ || TRIM(To_Hub_Id)) (VARCHAR(512)) AS Path,
1 AS TheLength
FROM Leg
WHERE From_id = ‘1’;

WHILE ACTIVITY_COUNT > 0 DO
INSERT INTO Route
SELECT
t02.From_Id
t01.To_Id
t02.Path || ‘,’ || TRIM(t01.To_Id)
t02.TheLength + 1
FROM Leg t01, Route t02
WHERE
t01.From_Id = t02.To_Id
AND t01.To_Id <> ‘1’
AND t02.TheLength =
(SELECT MAX(TheLength) FROM Route)
AND t01.To_Id NOT IN
(SELECT To_Id FROM Route)
AND B.TheLength < 200
QUALIFY ROW_NUMBER() OVER (PARTITION BY t01.To_Id ORDER BY t02.Path) = 1;
END WHILE;
BEGIN
DECLARE mycursor CURSOR WITH RETURN ONLY FOR
SELECT * FROM Route  ORDER BY 4, 3;
OPEN mycursor;
END;
END;

In general, it can’t be said which solution is faster or slower. It depends on how data structures accessed look like.

In this specific example, the stored procedure consumes fewer IOs and CPU that the recursive query.

There are several reasons:

  • Our stored procedure keeps a routes table of all visited nodes, while the recursive query might revisit the same node several times.
  • The recursive query continues to iterate even after the shortest path between two nodes has already been found. Running additional recursive steps increases spool usage quickly.
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>