Teradata Stage Table Design

Roland Wenzlofsky

June 24, 2015

minutes reading time


Even though stage tables are only temporary objects holding intermediate data, a conventional table design can differentiate between a good-performing load and a load proceeding at a snail’s pace. The additional efforts are paying off soon.

The correct choice of data types and sizes is essential. The load utility must push some bytes into Teradata, which determines the load time—using the suitable loading utility and minimizing the amount of data are primary goals in load optimization.

Multivalue compression and  NoPi tables should be additional tools you should consider.

Below is an example of how a poorly chosen data type can significantly increase your load times. We are loading the same file twice into Teradata (fast load). The only difference is that the textual columns are first defined as CHAR, and the second time they are defined as VARCHAR.

The average text length of each column (COl1 to COl10) is a few characters only. Using CHAR columns means we are wasting a lot of space (CHAR columns are filled with spaces to the full length).

When using CHAR(1000) as the data type for all columns, the fast load needs more than 10 seconds per 100,000 rows.

CREATE SET TABLE DB.CHAR_TABLE
(
COL1 CHAR(1000),
COL2 CHAR(1000),
COL3 CHAR(1000),
COL4 CHAR(1000),
COL5 CHAR(1000),
COL6 CHAR(1000),
COL7 CHAR(1000),
COL8 CHAR(1000),
COL9 CHAR(1000),
COL10 CHAR(1000)
PRIMARY INDEX (COL1);

**** 19:10:05 Number of recs/msg: 10
**** 19:10:05 Starting to send to RDBMS with record 1
**** 19:10:19 Starting row 100000
**** 19:10:32 Starting row 200000
**** 19:10:49 Starting row 300000
**** 19:10:59 Starting row 400000
**** 19:11:02 Finished sending rows to the RDBMS

If we change the data type and use VARCHAR(1000), the fast load is much faster and only needs about 1 second per 100,000 rows.

CREATE SET TABLE DB.VARCHAR_TABLE
(
COL1 VARCHAR(1000),
COL2 VARCHAR(1000),
COL3 VARCHAR(1000),
COL4 VARCHAR(1000),
COL5 VARCHAR(1000),
COL6 VARCHAR(1000),
COL7 VARCHAR(1000),
COL8 VARCHAR(1000),
COL9 VARCHAR(1000),
COL10 VARCHAR(1000)
)
PRIMARY INDEX (COL1);

**** 19:20:28 Number of recs/msg: 10
**** 19:20:28 Starting to send to RDBMS with record 1
**** 19:20:28 Starting row 100000
**** 19:20:29 Starting row 200000
**** 19:20:29 Starting row 300000
**** 19:20:30 Starting row 400000
**** 19:20:31 Finished sending rows to the RDBMS

Only by changing the data type, the load speed was increased by 10.

Proper table design becomes critical when we are dealing with wide rows. Wide rows cause only a few rows to fit into each data block, and as IOs are costly, load times increase significantly.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>