fbpx

Teradata Vantage Historisation

By Paul Timar

August 31, 2020


Situation and Setup of a Historisation in Teradata Vantage

Imagine that the time has come to improve your physical data model. Flawed or obsolete Teradata Vantage historisation design decisions need to be corrected. You are asked to merge tables of the same entity that hold different attributes in separation. We do not question why there are separate tables of the same entity at this point. Instead, we offer you a systematic approach for transforming any starting constellation to any end constellation under the constraint of any possible historisation setting on either side. By simple, we mean that there is only one time layer added but no additional validity layer.

Before we introduce the possible combinations of what our merge task looks like, let us define what historisation levels we could have in the tables that are input to the merge task:

0 = no historisation. Table contents are entered as they come. There is no created attribute to reveal when the data came and how long they were valid.

1 = simple historisation in Teradata Vantage. The table content is time-layered to indicate the period over which one row holds relevant information. This time layer is structured either by a CDD column so that per primary key and day loaded; there is one row stored or by two columns RSD and RED so that one row per primary key can cover a variable period and a new one is created only if non-key attributes have changed.

2 = double historisation in Teradata Vantage. Additional to the simple historisation, a validity layer is introduced via VSD and VED attributes. This allows for retroactive recreation of change histories.

Here are the parameters you have to set to determine which design case your task is to pick the appropriate code pattern:

  • Historisation level of input table A.
  • Historisation level of input table B.
  • Historisation level of the output table.

As a meta-decision, we standardize the level 1 historisation to using RSD and RED even if we have it implemented in a CDD. In merging, we can select CDD one time as RSD and one time as RED if we face a snapshot history in one table instead.

We present two fictional tables and one version per historisation to walk through every non-trivial example. Here are the versions of the first table, Table_A

CREATE SET TABLE TABLE_A0 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ENT_ID BIGINT NOT NULL,
C_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
D_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
E_VALUE INTEGER )
PRIMARY INDEX (ENT_ID );

CREATE SET TABLE TABLE_A1 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ENT_ID BIGINT NOT NULL,
CDD DATE FORMAT ‘YYYY-MM-DD’ NOT NULL,
C_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
D_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
E_VALUE INTEGER )
PRIMARY INDEX (ENT_ID )
PARTITION BY RANGE_N(CDD BETWEEN ‘2011-01-01’ AND DATE ‘2017-12-31’ EACH INTERVAL ‘1’ DAY , NO RANGE);

CREATE SET TABLE TABLE_A2 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(ENT_ID BIGINT NOT NULL,
CDD DATE FORMAT ‘YYYY-MM-DD’ NOT NULL,
C_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
D_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
E_VALUE INTEGER ,
VSD TIMESTAMP(6) FORMAT ‘YYYY-MM-DDHH:MI:SS’ NOT NULL,
VED TIMESTAMP(6) FORMAT ‘YYYY-MM-DDHH:MI:SS’ NOT NULL )
PRIMARY INDEX (ENT_ID )
PARTITION BY RANGE_N(CDD BETWEEN ‘2011-01-01’ AND DATE ‘2017-12-31’ EACH INTERVAL ‘1’ DAY , NO RANGE);

Here are the versions of the second table, Table_B:

CREATE SET TABLE TABLE_B0 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ENT_ID BIGINT NOT NULL,
F_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
G_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
H_VALUE INTEGER )
PRIMARY INDEX (ENT_ID );

CREATE SET TABLE TABLE_B1 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(ENT_ID BIGINT NOT NULL,
RSD DATE FORMAT ‘YYYY-MM-DD’ NOT NULL,
RED DATE FORMAT ‘YYYY-MM-DD’ NOT NULL,
F_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
G_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
H_VALUE INTEGER )
PRIMARY INDEX (ENT_ID );

CREATE SET TABLE TABLE_B2 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ENT_ID BIGINT NOT NULL,
RSD DATE FORMAT ‘YYYY-MM-DD’ NOT NULL,
VSD TIMESTAMP(6) FORMAT ‘YYYY-MM-DDHH:MI:SS’ NOT NULL,
RED DATE FORMAT ‘YYYY-MM-DD’ NOT NULL,
VED TIMESTAMP(6) FORMAT ‘YYYY-MM-DDHH:MI:SS’ NOT NULL,
F_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
G_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
H_VALUE INTEGER )
PRIMARY INDEX (ENT_ID );

Our target table can have one of the following three forms:

CREATE SET TABLE TARGET_TABLE0 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ENT_ID BIGINT NOT NULL,
C_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
D_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
E_VALUE INTEGER ,
F_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
G_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
H_VALUE INTEGER )
PRIMARY INDEX (ENT_ID );

CREATE SET TABLE TARGET_TABLE1 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(
ENT_ID BIGINT NOT NULL,
RSD DATE FORMAT ‘YYYY-MM-DD’ NOT NULL,
RED DATE FORMAT ‘YYYY-MM-DD’ NOT NULL,
C_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
D_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
E_VALUE INTEGER ,
F_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
G_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
H_VALUE INTEGER )
PRIMARY INDEX (ENT_ID );

CREATE SET TABLE TARGET_TABLE2 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT
(ENT_ID BIGINT NOT NULL,
RSD DATE FORMAT ‘YYYY-MM-DD’ NOT NULL,
VSD TIMESTAMP(6) FORMAT ‘YYYY-MM-DDHH:MI:SS’ NOT NULL,
RED DATE FORMAT ‘YYYY-MM-DD’ NOT NULL,
VED TIMESTAMP(6) FORMAT ‘YYYY-MM-DDHH:MI:SS’ NOT NULL,
C_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
D_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
E_VALUE INTEGER ,
F_STATUS CHAR(1) CHARACTER SET LATIN NOT CASESPECIFIC NOT NULL,
G_LEVEL_CD CHAR(3) CHARACTER SET LATIN NOT CASESPECIFIC,
G_VALUE INTEGER )
PRIMARY INDEX (ENT_ID );

To show the effects of our case studies and prove to ourselves that the code we provide works, we insert a small set of dummy records into the table versions of A and B.

The target table versions are empty by definition and only serve as an illustration of where we want to go with the data.

— fill the A Table versions

INSERT INTO TABLE_A0 (ENT_ID,C_STATUS,D_LEVEL_CD,E_VALUE ) VALUES (1,’A’,’X’,100) ;
INSERT INTO TABLE_A0 (ENT_ID,C_STATUS,D_LEVEL_CD,E_VALUE ) VALUES (2,’B’,’Y’,200) ;

INSERT INTO TABLE_A1 (ENT_ID,CDD,C_STATUS,D_LEVEL_CD,E_VALUE ) VALUES (4,DATE’2014-01-19′,’A’,’X’,55) ;
INSERT INTO TABLE_A1 (ENT_ID,CDD,C_STATUS,D_LEVEL_CD,E_VALUE ) VALUES (4,DATE’2014-01-20′,’A’,’X’,55) ;

 

INSERT INTO TABLE_A2 (ENT_ID,RSD,VSD,RED,VED,C_STATUS,D_LEVEL_CD,E_VALUE ) VALUES (1,DATE’2014-01-01′,TIMESTAMP ‘2014-01-01 07:33:59.000000′, DATE’2014-01-01’, TIMESTAMP ‘2014-01-02 07:59:58.000000′ ,’A’,’X’,0) ;
INSERT INTO TABLE_A2 (ENT_ID,RSD,VSD,RED,VED,C_STATUS,D_LEVEL_CD,E_VALUE ) VALUES (1,DATE’2014-01-02′,TIMESTAMP ‘2014-01-01 07:33:59.000000′,DATE’2014-01-02’,TIMESTAMP ‘2014-01-02 07:59:58.000000′ ,’A’,’X’,0) ;

INSERT INTO TABLE_A2 (ENT_ID,RSD,VSD,RED,VED,C_STATUS,D_LEVEL_CD,E_VALUE ) VALUES (4,DATE’2014-01-05′,TIMESTAMP ‘2014-01-05 09:09:09.000000′,DATE’2014-01-18’ ,TIMESTAMP ‘3000-12-31 23:59:59.000000′ ,’A’,’X’,0 ) ;
INSERT INTO TABLE_A2 (ENT_ID,RSD,VSD,RED,VED,C_STATUS,D_LEVEL_CD,E_VALUE ) VALUES (4,DATE’2014-01-19′, TIMESTAMP ‘2014-01-19 03:39:40.000000′,DATE’3000-12-31’, TIMESTAMP ‘3000-12-31 23:59:59.000000′,’A’,’X’,55) ;

— fill the B Table versions

INSERT INTO TABLE_B0 (ENT_ID,F_STATUS,G_LEVEL_CD,H_VALUE ) VALUES (1,’F’,’Q’,5) ;
INSERT INTO TABLE_B0 (ENT_ID,F_STATUS,G_LEVEL_CD,H_VALUE ) VALUES (2,’J’,’V’,20) ;
INSERT INTO TABLE_B0 (ENT_ID,F_STATUS,G_LEVEL_CD,H_VALUE ) VALUES (4,’L’,’V’,20) ;

INSERT INTO TABLE_B1 (ENT_ID,RSD,RED,F_STATUS,G_LEVEL_CD,H_VALUE ) VALUES (1,DATE’2014-01-01′,DATE’2014-01-02′,’F’,’Q’,0) ;
INSERT INTO TABLE_B1 (ENT_ID,RSD,RED,F_STATUS,G_LEVEL_CD,H_VALUE ) VALUES (1,DATE’2014-01-03′,DATE’3000-12-31′,’F’,’Q’,5) ;

INSERT INTO TABLE_B1 (ENT_ID,RSD,RED,F_STATUS,G_LEVEL_CD,H_VALUE ) VALUES (4,DATE’2014-01-18′,DATE’2014-01-21′,’K’,’V’,20) ;
INSERT INTO TABLE_B1 (ENT_ID,RSD,RED,F_STATUS,G_LEVEL_CD,H_VALUE ) VALUES (4,DATE’2014-01-22′,DATE’3000-12-31′,’L’,’V’,20) ;

 INSERT INTO TABLE_B2 (ENT_ID,RSD,RED,F_STATUS,G_LEVEL_CD,H_VALUE,VSD,VED ) VALUES (1,DATE’2014-01-01′,DATE’3000-12-31′,’F’,’Q’,0,TIMESTAMP ‘2014-01-01 07:33:59.000000’,TIMESTAMP ‘2014-01-02 07:59:58.000000’) ;
INSERT INTO TABLE_B2 (ENT_ID,RSD,RED,F_STATUS,G_LEVEL_CD,H_VALUE,VSD,VED ) VALUES (1,DATE’2014-01-01′,DATE’2014-01-02′,’F’,’Q’,0,TIMESTAMP ‘2014-01-02 07:59:59.000000’,TIMESTAMP ‘3000-12-31 23:59:59.000000’) ;

INSERT INTO TABLE_B2 (ENT_ID,RSD,RED,F_STATUS,G_LEVEL_CD,H_VALUE,VSD,VED ) VALUES (4,DATE’2014-01-18′,DATE’2014-01-21′,’K’,’V’,20,TIMESTAMP ‘2014-01-18 08:08:34.00000’,TIMESTAMP ‘2014-01-22 07:48:11.00000’ )
INSERT INTO TABLE_B2  (ENT_ID,RSD,RED,F_STATUS,G_LEVEL_CD,H_VALUE,VSD,VED ) VALUES (4,DATE’2014-01-22′, DATE’3000-12-31′,’L’,’V’,20,TIMESTAMP ‘2014-01-22 07:48:12.000000’,TIMESTAMP ‘3000-12-31 23:59:59.000000’ ) ;

Note that the two tables are not synchronous concerning entities or historisation events. This is a wanton disorder that forces us to foresee such a situation and treat it in a meaningful way in the code examples to come.

In the coming parts of this series, we will systematically walk through each of the cases mentioned above, interpret it and provide SQL solutions whenever there is more to do than just a simple merge of two tables.

Cases will be described with the pattern A-B->T, where A, B, and T can be any of the historisation levels 0 to 2.

Teradata Vantage Downleveling Cases for Historisation

Next, we discuss the removal of historisation in Teradata Vantage. Assume that the task is presented to you as bringing table contents into a “stable” or “final” state or as copying the contents into a “present-only” environment while you merge. Please refer to part 1 of this article series to understand the table design and content. Historisation down leveling cases are all of the following:

(1-0-0, 0-1-0, 1-1-0, 0-2-0, 2-2-0, 1-2-0, 0-2-1, 1-2-1, 2-2-1)

 Case 1-0->0 We downlevel table A to no historisation, merge it with unhistorised table B, and insert the result into a table without historisation. In concrete SQL terms:

— CASE1-0->0

SEL COALESCE(B.ENT_ID, A.ENT_ID) AS ENT_ID, A.C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE
FROM
(
— dehistorise A to most recent per ENT
sel A.ENT_ID, A.C_STATUS ,A.D_LEVEL_CD , A.E_VALUE
FROM TABLE_A1 A
WHERE (A.ENT_ID, A.CDD) IN
(SELECT ENT_ID, MAX(CDD) AS CDD FROM TABLE_A1 GROUP BY 1 )
) A
FULL OUTER JOIN
TABLE_B0 B
ON A.ENT_ID = B.ENT_ID;

Case 0-1->0 We downlevel table B this time. All other elements of the task are similar to the previous case.

Case 1-1->0 This case of a straight “dehistorisation” requires merely select the most recent entries from both tables A and B, strip them of their history fields, and insert the result into the target table.

 Case 0-2->0 As we dehistorise table B, we only select to most new and valid layer from table B per key, leave out the history attributes and merge with table A.

 Case 2-2->0 As the goal is again to dehistorise. We just have to adapt the SQL code to take the high historisation levels on both sides into account. All other assumptions being equal, the SQL looks like this:

— CASE 2-2-> 0

SEL COALESCE(B.ENT_ID, A.ENT_ID) AS ENT_ID, A.C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE
FROM
(
— dehistorise A to most recent per ENT
sel A.ENT_ID, A.C_STATUS ,A.D_LEVEL_CD , A.E_VALUE
FROM TABLE_A2 A
WHERE A.VED>CURRENT_TIMESTAMP
AND (A.ENT_ID, A.RSD) IN
(SELECT ENT_ID, MAX(RSD) AS RSD FROM TABLE_A2 WHERE VED>CURRENT_TIMESTAMP GROUP BY 1)
) A
FULL OUTER JOIN
(
— dehistorise B to most recent per ENT
sel B.ENT_ID, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE
FROM TABLE_B2 B
WHERE B.VED>CURRENT_TIMESTAMP
AND (B.ENT_ID, B.RSD) IN
(SELECT ENT_ID, MAX(RSD) AS RSD FROM TABLE_B2 WHERE VED>CURRENT_TIMESTAMP 
GROUP BY 1)) B
ON A.ENT_ID = B.ENT_ID;

Case 1-2->0 Here, we need to match the two divergently historized tables together correctly to dehistorise the result in the end. To perform this, we have to make the intermediary assumption that the valid history is to be reduced to the most recent entry per key only. One historical state of the entity is considered the “solid” one in a target without history. Typically, this state is the most recent entry.

— CASE 1-2-> 0

SEL COALESCE(B.ENT_ID, A.ENT_ID) AS ENT_ID, A.C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE
FROM
(
— dehistorise A to most recent per ENT
sel A.ENT_ID, A.C_STATUS ,A.D_LEVEL_CD , A.E_VALUE
FROM TABLE_A1 A
WHERE (A.ENT_ID, A.CDD) IN
(SELECT ENT_ID, MAX(CDD) AS CDD FROM TABLE_A1 GROUP BY 1 )
) A
FULL OUTER JOIN
(
— dehistorise B to most recent per ENT
sel B.ENT_ID, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE
FROM TABLE_B2 B
WHERE B.VED>CURRENT_TIMESTAMP
AND (B.ENT_ID, B.RSD) IN
(SELECT ENT_ID, MAX(RSD) AS RSD FROM TABLE_B2 WHERE VED>CURRENT_TIMESTAMP 
GROUP BY 1)
) B
ON A.ENT_ID = B.ENT_ID;

Case 0-2->1 Here, we perform a double-step shifting to a different historisation level while integrating an unhistorised table. Again, there is room for interpretation on the unhistorised A table and the treatment of entities not present in B. In any case, we treat merged histories with missing B entries as if the history of this combined result started at the merge date. Derived dates based on specific knowledge about the genesis of the source tables are possible as well.

There are two SQL versions based on how we interpret the non-historized part:

— CASE 0-2-> 1, FIRST Interpretation: A Table shows alltime valid history

SELECT

COALESCE(B.ENT_ID, A.ENT_ID) AS ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE, COALESCE(B.RSD,CURRENT_DATE), COALESCE(B.RED, DATE’3000-12-31′)
FROM
TABLE_B2 B
FULL OUTER JOIN
TABLE_A0 A
ON B.ENT_ID = A.ENT_ID
WHERE
(B.VED>CURRENT_TIMESTAMP
OR B.VED IS NULL )
;

— CASE 0-2-> 1, Second Interpretation: A Table shows most recent history only

SELECT
COALESCE(B.ENT_ID, A.ENT_ID) AS ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE, COALESCE(B.RSD,CURRENT_DATE), COALESCE(B.RED, DATE’3000-12-31′)
FROM
TABLE_B2 B
FULL OUTER JOIN
TABLE_A0 A
ON B.ENT_ID = A.ENT_ID
AND B.RED>DATE
WHERE
(B.VED>CURRENT_TIMESTAMP
OR B.VED IS NULL )
;

Case 1-2->1 This is the first instance that demands a more sophisticated treatment because two different history streams have to be aligned such that every time a change occurred either for table A or table B, a new resulting history row emerges. We execute this by first summarizing all distinct RSD points in time from both tables. Then we rejoin both tables to this timeline. Finally, we outline the constellations along the timeline to create a new RSD and RED history stream.

— CASE 1-2->1

SELECT ENT_ID, THESTARTDT, (CASE WHEN THEENDDT>CURRENT_DATE THEN DATE’3000-12-31′ ELSE THEENDDT ENDAS THEENDDT,C_STATUS ,D_LEVEL_CD , E_VALUE,F_STATUS, G_LEVEL_CD, H_VALUE
FROM
(SELECT
ENT_ID, C_STATUS ,D_LEVEL_CD , E_VALUE,F_STATUS, G_LEVEL_CD, H_VALUE, MIN(THEDATE) AS THESTARTDT, MAX(CASE WHEN THEEND<THEDATE THEN RED ELSE THEEND ENDAS THEENDDT
FROM
(
SELECT U.ENT_ID, U.RSD AS THEDATE,
MAX(U.RSD) OVER (PARTITION BY U. ENT_ID ORDER BY U. ENT_ID, U.RSD ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) -1 as THEEND,
A.C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, A.RSD, A.RED,B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE
FROM
(select ENT_ID, RSD
FROM
(— Convert A history to RSD, RED
sel A.ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE,
MIN(CDD) AS RSD,
MAX(CDD) AS RED
FROM TABLE_A1 A
GROUP BY 1,2,3,4 ) A
GROUP BY 1,2
UNION
select ENT_ID, RED
FROM
(— Convert A history to RSD, RED
sel A.ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE,
MIN(CDD) AS RSD,
MAX(CDD) AS RED
FROM TABLE_A1 A
GROUP BY 1,2,3,4 ) A
GROUP BY 1,2
UNION ALL
select ENT_ID, RSD
FROM
TABLE_B2 B
WHERE B.VED>CURRENT_TIMESTAMP
group by 1,2
UNION
select ENT_ID, RED
FROM
TABLE_B2 B
WHERE B.VED>CURRENT_TIMESTAMP
group by 1,2
) U
LEFT OUTER JOIN
(— Convert A history to RSD, RED
SEL A.ENT_ID, C_STATUS ,D_LEVEL_CD , E_VALUE, A.RSD, (CASE WHEN A.RED= AM.THEMAXDT THEN DATE’3000-12-31′ ELSE A.RED ENDAS RED
FROM
(sel A.ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE,
MIN(CDD) AS RSD,
MAX(CDD) AS RED
FROM TABLE_A1 A
GROUP BY 1,2,3,4 ) A
JOIN
(Select A.ENT_ID, MAX(CDD) AS THEMAXDT
FROM TABLE_A1 A
GROUP BY 1 ) AM
ON A.ENT_ID = AM.ENT_ID) A
ON U.ENT_ID=A.ENT_ID
AND U.RSD BETWEEN A.RSD AND A.RED
LEFT OUTER JOIN
TABLE_B2 B
ON U.ENT_ID=B.ENT_ID
AND U.RSD BETWEEN B.RSD AND B.RED
AND B.VED>CURRENT_TIMESTAMP
) B
GROUP BY ENT_ID, C_STATUS ,D_LEVEL_CD , E_VALUE,F_STATUS, G_LEVEL_CD, H_VALUE
–HAVING THESTARTDT <=CURRENT_DATE
) F
order by 1,2,3;

Case 2-2->1 Practically speaking, this case is just a variation of the 1-2->1 case with both sides now having a validity layer that we flatten away by whittling down the list of rows to using the valid time layer only.

— CASE 2-2->1

SELECT ENT_ID, THESTARTDT, (CASE WHEN THEENDDT>CURRENT_DATE THEN DATE’3000-12-31′ ELSE THEENDDT ENDAS THEENDDT,C_STATUS ,D_LEVEL_CD , E_VALUE,F_STATUS, G_LEVEL_CD, H_VALUE
FROM
(
SELECT
ENT_ID, C_STATUS ,D_LEVEL_CD , E_VALUE,F_STATUS, G_LEVEL_CD, H_VALUE, MIN(THEDATE) AS THESTARTDT, MAX(CASE WHEN THEEND<THEDATE THEN RED ELSE THEEND ENDAS THEENDDT
FROM
(
SELECT U.ENT_ID, U.RSD AS THEDATE,
MAX(U.RSD) OVER (PARTITION BY U. ENT_ID ORDER BY U. ENT_ID, U.RSD ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) -1 as THEEND,
A.C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, A.RSD, A.RED,B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE
FROM
(select ENT_ID, RSD
FROM
(— Convert A history to RSD, RED
sel A.ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE,
MIN(RSD) AS RSD,
MAX(RED) AS RED
FROM TABLE_A2 A
WHERE A.VED>CURRENT_TIMESTAMP
GROUP BY 1,2,3,4 ) A
GROUP BY 1,2
UNION
select ENT_ID, RED
FROM
(— Convert A history to RSD, RED
sel A.ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE,
MIN(RSD) AS RSD,
MAX(RED) AS RED
FROM TABLE_A2 A
WHERE A.VED>CURRENT_TIMESTAMP
GROUP BY 1,2,3,4 ) A
GROUP BY 1,2
UNION ALL
select ENT_ID, RSD
FROM
TABLE_B2 B
WHERE B.VED>CURRENT_TIMESTAMP
group by 1,2
UNION
select ENT_ID, RED
FROM
TABLE_B2 B
WHERE B.VED>CURRENT_TIMESTAMP
group by 1,2
) U
LEFT OUTER JOIN
(— Convert A history to RSD, RED
SEL A.ENT_ID, C_STATUS ,D_LEVEL_CD , E_VALUE, A.RSD, (CASE WHEN A.RED= AM.THEMAXDT THEN DATE’3000-12-31′ ELSE A.RED ENDAS RED
FROM
sel A.ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE,
MIN(RSD) AS RSD,
MAX(RED) AS RED
FROM TABLE_A2 A
WHERE A.VED>CURRENT_TIMESTAMP
GROUP BY 1,2,3,4 ) A
JOIN
Select A.ENT_ID, MAX(RED) AS THEMAXDT
FROM TABLE_A2 A
WHERE A.VED>CURRENT_TIMESTAMP
GROUP BY 1 ) AM
ON A.ENT_ID = AM.ENT_ID
) A
ON U.ENT_ID=A.ENT_ID
AND U.RSD BETWEEN A.RSD AND A.RED
LEFT OUTER JOIN
TABLE_B2 B
ON U.ENT_ID=B.ENT_ID
AND U.RSD BETWEEN B.RSD AND B.RED
AND B.VED>CURRENT_TIMESTAMP
) B
GROUP BY ENT_ID, C_STATUS ,D_LEVEL_CD , E_VALUE,F_STATUS, G_LEVEL_CD, H_VALUE
–HAVING THESTARTDT <=CURRENT_DATE
) F
order by 1,2,3;

Teradata Vantage Upleveling Cases for Historisation

Next, we discuss table merge cases that should simultaneously lead to a net increase in the historisation level. Not only are we asked to merge, but we also need to adapt the result to the introduction of one more level of historisation than the source tables have had up to now.

Historisation upleveling can be any of the following cases: (0-0-1, 0-0-2, 0-1-2, 0-2-2, 1-1-2).

Case 0-0->1 is again the most trivial case from a technical point of view. Nevertheless, it requires some clarification. The only historisation we can introduce to the resulting table is a present and future one. We have nothing else than the most recent load result in the A and B tables and can therefore do nothing more than provide the current date or some other fixed date coming from business considerations as RSD or CDD.

Case 0-0->2 leads to the initialization of two layers at once. We take the date and timestamp of the time we merge and write them into the first and second layers of the historisation of the target table.

Case 0-1->2 We depart from case 0-1->1 in technical terms. All we have to add is the introduction of the second historisation layer from the historized tables, which can only be in a proactive way as there are is no past validity.

Note that we take the first interpretation of case 0-1->1 as the foundation.

— CASE 0-1-> 2, based on Interpretation that A Table shows the all-time valid history

SELECT
COALESCE(B.ENT_ID, A.ENT_ID) AS ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE, COALESCE(B.RSD,CURRENT_DATE), COALESCE(B.RED, DATE’3000-12-31′),
CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) ) AS VSD, TIMESTAMP ‘3000-12-31 23:59:59.000000’ AS VED
FROM
TABLE_B1 B
FULL OUTER JOIN
TABLE_A0 A
ON B.ENT_ID = A.ENT_ID
;

Case 0-2->2 Just as with the previous case, we have two options regarding how to deal with A. The difference now is to transport the entire content of table B to the target. Note the implicit decision in the second interpretation that B’s invalidated records are not matched with the A table content.

— CASE 0-2-> 2 FIRST Interpretation: A Table shows alltime valid history

SELECT
COALESCE(B.ENT_ID, A.ENT_ID) AS ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE,
COALESCE(B.RSD,CURRENT_DATE),COALESCE(B.VSD, CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) )) AS VSD,
COALESCE(B.RED, DATE’3000-12-31′), COALESCE(B.VED, CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) )) AS VED
FROM
TABLE_B2 B
FULL OUTER JOIN
TABLE_A0 A
ON B.ENT_ID = A.ENT_ID;

— CASE 0-2-> 2, Second Interpretation: A Table shows most recent history only

SELECT
COALESCE(B.ENT_ID, A.ENT_ID) AS ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE,
COALESCE(B.RSD,CURRENT_DATE),
COALESCE(B.VSD, CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) )) AS VSD,
COALESCE(B.RED, DATE’3000-12-31′), COALESCE(B.VED, CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) )) AS VED
FROM
TABLE_B2 B
FULL OUTER JOIN
TABLE_A0 A
ON B.ENT_ID = A.ENT_ID
AND B.RED>DATE
AND
(B.VED>CURRENT_TIMESTAMP
OR B.VED IS NULL );

 Case 1-1->2 Once we have a solution for the base case 1-1->1 (see part 4 of this series), we can depart from it to solve this case by adding a present and future validity layer set of columns.

Teradata Vantage Straight Continuation Cases for Historisation

Here we present those cases that start and end at the same level of historisation. While these cases sound like the simplest of merges, they are, in fact, some of the most demanding ones. There are three such cases: (0-0-0, 1-1-1, 2-2-2).

Case 0-0->0 is, indeed, trivial. Just (full outer) join the two tables A and B and insert the result into the target table.

Case 1-1->1 We have to take care to match divergent streams of historic changes together to one consistent history that leads to a new entry per key whenever there was a change in table A or B.

We summarize the snapshot history by picking the minimum and maximum CDD from A per key and non-key attribute combinations over time. The following code gathers all points in time where either Table A or Table B starts a new historic layer for an entity, calculates the resulting RSD and RED row, and joins A and B table contents together along with this time constellations. We arrive at a unified history layer that opens a new time layer whenever A or B changes the row content.

— CASE 1-1-> 1

SELECT ENT_ID, THESTARTDT, (CASE WHEN THEENDDT>CURRENT_DATE THEN DATE’3000-12-31′ ELSE THEENDDT ENDAS THEENDDT,
C_STATUS ,D_LEVEL_CD , E_VALUE,F_STATUS, G_LEVEL_CD, H_VALUE
FROM
(
SELECT
ENT_ID, C_STATUS ,D_LEVEL_CD , E_VALUE,F_STATUS, G_LEVEL_CD, H_VALUE, MIN(THEDATE) AS THESTARTDT, MAX(THEEND) AS THEENDDT
FROM
(
SELECT U.ENT_ID, U.RSD AS THEDATE,
MAX(U.RSD) OVER (PARTITION BY U. ENT_ID ORDER BY U. ENT_ID, U.RSD ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) -1 as THEEND,
A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, A.RSD, A.RED,B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE
FROM
(select ENT_ID, RSD
FROM
(— Convert A history to RSD, RED
sel A.ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE,
MIN(CDD) AS RSD,
MAX(CDD) AS RED
FROM TABLE_A1 A
GROUP BY 1,2,3,4 ) A
GROUP BY 1,2
UNION
select ENT_ID, RED
FROM
(— Convert A history to RSD, RED
sel A.ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE,
MIN(CDD) AS RSD,
MAX(CDD) AS RED
FROM TABLE_A1 A
GROUP BY 1,2,3,4 ) A
GROUP BY 1,2
UNION ALL
select ENT_ID, RSD
FROM
TABLE_B1 B
group by 1,2
UNION
select ENT_ID, RED
FROM
TABLE_B1 B
group by 1,2
) U
LEFT OUTER JOIN
(— Convert A history to RSD, RED
sel A.ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE,
MIN(CDD) AS RSD,
MAX(CDD) AS RED
FROM TABLE_A1 A
GROUP BY 1,2,3,4 ) A
ON U.ENT_ID=A.ENT_ID
AND U.RSD BETWEEN A.RSD AND A.RED
LEFT OUTER JOIN
TABLE_B1 B
ON U.ENT_ID=B.ENT_ID
AND U.RSD BETWEEN B.RSD AND B.RED
) B
GROUP BY ENT_ID, C_STATUS ,D_LEVEL_CD , E_VALUE,F_STATUS, G_LEVEL_CD, H_VALUE
HAVING THESTARTDT <=CURRENT_DATE
) F
order by 1,2,3;

Case 2-2->2 This appears to be the most complicated case. We have to align not just two possibly asynchronous RSD and RED histories. Still, given this alignment, we also have to decide on alignments of intermediate layers of invalidated histories. Besides the technical complexity, we assume that we are only interested in the histories deemed valid when we merge. Space considerations are another concern: a complete treatment of this case could lead to a merged target table being much bigger than the two merged tables together. Therefore we simplify this case to being similar to Case 1-1->1 with an additional condition on the valid time layers being selected. Therefore we can use the SQL of Case 1-1->1 as a pattern. All that remains is to add present and default openness timestamps to the resulting column set.

Teradata Vantage Mixed Continuation Cases of Historisations

While the cases in the final part of our Teradata Vantage historisation series do not mean any change in the watermark historisation level of one side of the merge and are, as such, a mere net continuation, they also include an up-leveling on the other side and are therefore neither a conceptually straightforward merge-and-continue on a particular level nor a complete level shift. Therefore let us treat these cases as a class of its own. They are three such constellations: (0-1-1, 0-2-2, 1-2-2).

 Case 0-1->1 This case is less straightforward than it might seem at first glance. While we have a historized table B1, we have to assume how to interpret the data in Table A1. First, they could be seen as a stable history and ready to be merged with all historic rows per primary key in table B1. Second, they could be seen as just the most recent state per primary key. This means that the history of the Table A1 part of the entity is unknown and left NULL for any time before now. Therefore, we join A1 to the most recent entry in Table B1 and store this time series in the target table. Furthermore, we have to decide on how to initialize the history of those entities of A that have no record in B. We treat them as if they appeared at the date of the transformation. Any other date must be derived from information outside of what the tables alone provide and requires a business understanding or knowledge of the data warehouse’s history and development.

You can see the difference in both versions discussed when you look at the SQL of both:

— CASE 0-1-> 1, FIRST Interpretation: A Table shows alltime valid history

SELECT
COALESCE(B.ENT_ID, A.ENT_ID) AS ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE, COALESCE(B.RSD,CURRENT_DATE), COALESCE(B.RED, DATE’3000-12-31′)
FROM
TABLE_B1 B
FULL OUTER JOIN
TABLE_A0 A
ON B.ENT_ID = A.ENT_ID
;

— CASE 0-1-> 1, SECOND Interpretation: A Table shows most recent history only

SELECT
COALESCE(B.ENT_ID, A.ENT_ID) AS ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE, COALESCE(B.RSD,CURRENT_DATE), COALESCE(B.RED, DATE’3000-12-31′)
FROM
TABLE_B1 B
FULL OUTER JOIN
TABLE_A0 A
ON B.ENT_ID = A.ENT_ID
AND B.RED>DATE;

Case 0-2->2 We treat this case as an extension to the solution of Case 0-1->2. Making the appropriate assumption about what the unhistorised table A holds – a steady all-time list or most recent data only – we adhere to the validity of table B in the result. In case there is no B history for an entity, we state the presence and openness validity as default.

— CASE 0-2> 2, based on Interpretation that A Table shows all-time valid history AND it fits into the validity of table B

SELECT

COALESCE(B.ENT_ID, A.ENT_ID) AS ENT_ID, A. C_STATUS ,A.D_LEVEL_CD , A.E_VALUE, B.F_STATUS, B.G_LEVEL_CD, B.H_VALUE, COALESCE(B.RSD,CURRENT_DATE), COALESCE(B.RED, DATE’3000-12-31′),
COALESCE(B.VSD, CAST(CURRENT_TIMESTAMP AS TIMESTAMP(6) ) ) AS VSD, COALESCE(B.VED, TIMESTAMP ‘3000-12-31 23:59:59.000000’ ) AS VED
FROM
TABLE_B2 B
FULL OUTER JOIN
TABLE_A0 A
ON B.ENT_ID = A.ENT_ID;

Case 1-2->2 There is a conceptual problem; in this case, we have to solve first. How can we preserve any input validity if the task involves aligning two not necessarily congruent RSD and RED histories? We answer that we cannot do so in a meaningful way. Therefore, we reduce this case to the solution of Case 1-1->1 and add a present and openness validity layer on top of it. This is equal to Case 1-1->2.

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Paul Timar

Paul Timar is a graduated Sociologist and Economist who turned Data Warehouse Specialist. He has more than 7 years of experience in working with Oracle and Teradata Data Warehouses in the fields of Social Security Data and Banking. His latest focus is on performance optimization.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>