Improve Your Teradata Load Performance

Roland Wenzlofsky

June 21, 2022

minutes reading time


What influences the Teradata Load Performance?

Teradata allows only a limited number of bulk loads to be executed together as they consume many resources. The precise limit depends on your Teradata system. It will be somewhere in the lower 2-digit area.

Whenever the utility limit is reached, additional requests will end up in a workload queue. As a result, our loading run times are volatile.

To achieve optimal Teradata Load Performance it is important to choose the right loading method per table.

Bulk loads for huge tables

It may be better to stage our data without bulk load utilities (Fastload, Multiload, TPT Bulk Load) depending on how many rows are in our tables but also how many columns of which data type they contain prefer transactional load methods instead (BTEQ, TPump).

The larger the table to be loaded, the higher the advantage of bulk load utilities because they do not load row by row, but whole data blocks simultaneously. Combined with NOPI tables, the advantage over transactional loads can increase enormously because the sorting phase of the fastload is not needed.

How to use NOPI tables correctly you can read in this article:

Transactional loads for small tables

Transactional loads use the transient journal to allow rollbacks. Therefore I recommend using these methods for small tables with a few thousand rows. Transactional loads of small tables are always preferable to bulk loads because, on the one hand, we achieve more stable load times. On the other hand, we leave the utility slots free for large tables that need them.

We must also remember that bulk load utilities use many sessions in parallel. Logging in this session can take a large part of the time if the number of rows to be loaded is small.

Teradata Load Performance

Finding incorrectly selected load methods

In an existing data warehouse ETL process, it is helpful to look for inconsistencies in the loading, i.e., for large tables loaded transactionally and for small tables loaded with a bulk load utility. The QueryLog (DBC.DBQLOGTBL) is helpful for the analysis where we can find all the needed information. Here is an example SQL statement you can use in practice:

SELECT TheTable,UtilityRowCount,Starttime,FirstRespTime,RunTime
FROM
(
SELECT
SUBSTR(t01.QUERYTEXT,15) AS TABLE_START,
SUBSTR(TABLE_START,1,MINDEX(TABLE_START,' ')) AS TheTABLE,
t02.UtilityRowCount,
t02.TotalIOCount,
t01.Starttime,
t02.FirstRespTime,
(t02.FIRSTRESPTIME - t01.STARTTIME DAY(2) TO SECOND(6) ) RUNTIME,
60*60*EXTRACT(HOUR FROM RUNTIME) +  60*EXTRACT (MINUTE FROM  RUNTIME) +
EXTRACT(SECOND FROM  RUNTIME)    (DECIMAL(18,2))  AS  RUNTIME_SECS
FROM DBC.DBQLogTbl  t01
INNER  JOIN
DBC.DBQLogTbl  t02
ON
t01.SESSIONID = t02.SESSIONID
WHERE
t01.STARTTIME(DATE)  BETWEEN DATE-21 AND DATE
AND t01.StatementType =  'BEGIN Loading'
AND t02.STARTTIME(DATE) BETWEEN DATE-21 AND DATE
AND   t02.StatementType = 'END Loading'
)  x;


The evaluation provides all fastloads and multiloads performed in a certain period and how many rows were loaded. From this query, you can identify those bulk loads that should be switched to transactional loading. In my example below the table, MyTable2 is a candidate:

TheTableUtilityRowCountStarttimeFirstRespTimeRunTime
MyTable4.352.814,0016.07.2015 00:25:2416.07.2015 00:25:59  0 00:00:35.450000
MyTable2100,0016.07.2015 00:04:2516.07.2015 04:25:30  0 00:00:05.450000
Teradata Load Performance – Chose the correct load method

Unfortunately, this method has a limitation. It works only if there are no filters defined on CPU seconds for the QueryLog. Otherwise, most of the “BEGIN LOADING” statements will be missing, and you can’t make a correct evaluation.

Here is a link to the official Teradata documentation for NOPI tables:

Teradata Documentation: NOPI Tables

  • Avatar
    David Lange says:

    Thanks for the articles and examples. This is a very useful practical query, would like to see more.

  • Hi,

    I am not sure if delays caused by missing utility slots are shown in these two columns? Maybe somebody else can comment on this?

    Usually, I pick up the session of my fastload from the DBC.LogonOff view. I take the column LogonTime of this table and calculate the difference between the column Start time of DBC.DBQLOGTBL and the column LogonTime of DBC.LogonOff. The difference between the logon and the first activity in the QueryLog seems to be a good indication about how long the fastload was waiting for a utility slot.

    I just checked the documentation. Actually, you are right: The Delays should be visible in both columns you mentioned. WDDelay shows the workload delays and Delay should sum up all delays which happened (system and workload-specific).

    My approach from the above may be wrong. But I am not sure 😉

    Hope this helps.

    Roland

  • Hello Roland, Thank you for this article.

    Using the above-mentioned sequel I found more than 40 tables in our DWH system, however, I also found that there are no utility loads which may have delaytime>0 or wddelaytime >0.
    So here can I conclude that I do not need to convert those 40 fastload/mload jobs to btq/tpt loads?
    Please advise.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >