Teradata Stage Table Design

Even though stage tables are only temporary objects, holding intermediate data, a conventional table design can make the difference between a good performing load and a load which is proceeding at a snail’s pace. The additional efforts are paying off soon.

The correct choice of data types and sizes is important. At the end of the day, it’s some bytes which have to be pushed into Teradata which determines the load time. Using the suitable loading utility and minimizing the amount of data are primary goals in load optimization.

Multivalue compression and  NoPi tables should be additional tools you should consider.

Below is an example, how a poorly chosen data type can significantly increase your load times. We are loading the exactly same file twice into Teradata (fast load). The only difference is that the first time the textual columns are defined as CHAR and the second time they are defined as VARCHAR.

The average text length of each column (COl1 to COl10) is a few characters only. The usage of CHAR columns means we are wasting a lot of space (CHAR columns are filled up with spaces to the full length).

When using CHAR(1000) as the data type for all columns, the fast load needs more than 10 seconds per 100,000 rows.

CREATE SET TABLE DB.CHAR_TABLE
(
COL1 CHAR(1000),
COL2 CHAR(1000),
COL3 CHAR(1000),
COL4 CHAR(1000),
COL5 CHAR(1000),
COL6 CHAR(1000),
COL7 CHAR(1000),
COL8 CHAR(1000),
COL9 CHAR(1000),
COL10 CHAR(1000)
PRIMARY INDEX (COL1);

**** 19:10:05 Number of recs/msg: 10
**** 19:10:05 Starting to send to RDBMS with record 1
**** 19:10:19 Starting row 100000
**** 19:10:32 Starting row 200000
**** 19:10:49 Starting row 300000
**** 19:10:59 Starting row 400000
**** 19:11:02 Finished sending rows to the RDBMS

If we are changing the data type and are using VARCHAR(1000), the fastload is much faster and only needs about 1 second per 100,000 rows.

CREATE SET TABLE DB.VARCHAR_TABLE
(
COL1 VARCHAR(1000),
COL2 VARCHAR(1000),
COL3 VARCHAR(1000),
COL4 VARCHAR(1000),
COL5 VARCHAR(1000),
COL6 VARCHAR(1000),
COL7 VARCHAR(1000),
COL8 VARCHAR(1000),
COL9 VARCHAR(1000),
COL10 VARCHAR(1000)
)
PRIMARY INDEX (COL1);

**** 19:20:28 Number of recs/msg: 10
**** 19:20:28 Starting to send to RDBMS with record 1
**** 19:20:28 Starting row 100000
**** 19:20:29 Starting row 200000
**** 19:20:29 Starting row 300000
**** 19:20:30 Starting row 400000
**** 19:20:31 Finished sending rows to the RDBMS

Only by changing the data type, the load speed was increased by a factor of 10.

Proper table design becomes especially important when we are dealing with wide rows. Wide rows cause that only a few rows fit into each data block and as IOs are costly, load times increase significantly.

Questions?
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you.
Roland Wenzlofsky
 

Roland Wenzlofsky is a graduated computer scientist and Data Warehouse professional working with the Teradata database system for more than 15 years. He is experienced in the fields of banking and telecommunication with a strong focus on performance optimization.

>