While stage tables are temporary objects for intermediate data, a well-designed table can distinguish between a high-performing load and a slow one. The extra effort pays off quickly.
Selecting appropriate data types and sizes is crucial. The load process inserts bytes into Teradata, impacting the load time. Optimizing the load process involves choosing the appropriate loading utility and minimizing the data volume.
Consider utilizing Multivalue compression and NoPi tables as beneficial tools.
Improper data types can significantly increase load times. In this example, we load a file twice into Teradata using fast load. The difference is that the first load defines the textual columns as CHAR, while the second load defines them as VARCHAR.
The average length of each column (COl1 to COl10) is short. Using CHAR columns is inefficient as they are filled with unnecessary spaces.
Using CHAR(1000) as the data type for all columns results in a fast load duration of over 10 seconds for every 100,000 rows.
CREATE SET TABLE DB.CHAR_TABLE
(
COL1 CHAR(1000),
COL2 CHAR(1000),
COL3 CHAR(1000),
COL4 CHAR(1000),
COL5 CHAR(1000),
COL6 CHAR(1000),
COL7 CHAR(1000),
COL8 CHAR(1000),
COL9 CHAR(1000),
COL10 CHAR(1000)
PRIMARY INDEX (COL1);
**** 19:10:05 Number of recs/msg: 10
**** 19:10:05 Starting to send to RDBMS with record 1
**** 19:10:19 Starting row 100000
**** 19:10:32 Starting row 200000
**** 19:10:49 Starting row 300000
**** 19:10:59 Starting row 400000
**** 19:11:02 Finished sending rows to the RDBMS
Using the VARCHAR(1000) data type significantly improves fast load performance, with an impressive rate of only 1 second per 100,000 rows.
CREATE SET TABLE DB.VARCHAR_TABLE
(
COL1 VARCHAR(1000),
COL2 VARCHAR(1000),
COL3 VARCHAR(1000),
COL4 VARCHAR(1000),
COL5 VARCHAR(1000),
COL6 VARCHAR(1000),
COL7 VARCHAR(1000),
COL8 VARCHAR(1000),
COL9 VARCHAR(1000),
COL10 VARCHAR(1000)
)
PRIMARY INDEX (COL1);
**** 19:20:28 Number of recs/msg: 10
**** 19:20:28 Starting to send to RDBMS with record 1
**** 19:20:28 Starting row 100000
**** 19:20:29 Starting row 200000
**** 19:20:29 Starting row 300000
**** 19:20:30 Starting row 400000
**** 19:20:31 Finished sending rows to the RDBMS
Changing the data type improved the load speed by 10.
The efficient table design is crucial for handling wide rows as they decrease the number of rows fitting into each data block. This, in turn, leads to increased load times as IO operations are expensive.