What influences the Teradata Load Performance?
Due to the high consumption of resources, Teradata restricts the number of bulk loads that can be executed simultaneously. The specific limit varies depending on the Teradata system and typically falls within the range of 10 to 99.
Further requests will be added to a workload queue once the utility limit has been reached, causing our loading run times to be unpredictable.
Selecting the appropriate loading method per table is crucial for achieving maximum Teradata load efficiency.
Bulk loads for huge tables
Depending on the number of rows and columns, transactional loading methods (such as BTEQ and TPump) may be preferable over bulk load utilities (such as Fastload, Multiload, and TPT Bulk Load) with varying data types in our tables.
Bulk load utilities are particularly advantageous when dealing with larger tables as they can load whole data blocks simultaneously rather than row by row. This advantage is even greater when used in conjunction with NOPI tables, as it eliminates the need for the sorting phase required in transactional loads.
Learn the proper utilization of NOPI tables through this article.
Transactional loads for small tables
For rollbacks, utilize the transient journal when executing transactional loads. I suggest this approach for small tables comprising only a few thousand rows. Opting for transactional loads instead of bulk loads is more desirable for small tables due to the increased stability of load times. Additionally, this approach frees up utility slots for larger tables that require them.
It’s important to consider that bulk load utilities run multiple parallel sessions. Logging during these sessions can significantly increase the time it takes to load data if the number of rows being loaded is small.
Finding incorrectly selected load methods
Analyzing the loading method for tables can be beneficial for identifying inconsistencies in an ETL process within a data warehouse. The utilization of a query log, such as DBC.DBQLOGTBL, provides comprehensive information for this analysis. The following SQL statement can be implemented: SELECT QueryID, QueryBand, UserName, StartTime, RequestText FROM DBC.DBQLOGTBL WHERE RequestText LIKE ‘%INSERT INTO%’;
SELECT TheTable,UtilityRowCount,Starttime,FirstRespTime,RunTime
FROM
(
SELECT
SUBSTR(t01.QUERYTEXT,15) AS TABLE_START,
SUBSTR(TABLE_START,1,MINDEX(TABLE_START,' ')) AS TheTABLE,
t02.UtilityRowCount,
t02.TotalIOCount,
t01.Starttime,
t02.FirstRespTime,
(t02.FIRSTRESPTIME - t01.STARTTIME DAY(2) TO SECOND(6) ) RUNTIME,
60*60*EXTRACT(HOUR FROM RUNTIME) + 60*EXTRACT (MINUTE FROM RUNTIME) +
EXTRACT(SECOND FROM RUNTIME) (DECIMAL(18,2)) AS RUNTIME_SECS
FROM DBC.DBQLogTbl t01
INNER JOIN
DBC.DBQLogTbl t02
ON
t01.SESSIONID = t02.SESSIONID
WHERE
t01.STARTTIME(DATE) BETWEEN DATE-21 AND DATE
AND t01.StatementType = 'BEGIN Loading'
AND t02.STARTTIME(DATE) BETWEEN DATE-21 AND DATE
AND t02.StatementType = 'END Loading'
) x;
The evaluation records all fastloads and multiloads executed within a specific timeframe and the corresponding number of loaded rows. Through this inquiry, you can determine which bulk loads necessitate conversion to transactional loading. In this case, MyTable2 is a suitable candidate, as presented below.
TheTable | UtilityRowCount | Starttime | FirstRespTime | RunTime |
MyTable | 4.352.814,00 | 16.07.2015 00:25:24 | 16.07.2015 00:25:59 | 0 00:00:35.450000 |
MyTable2 | 100,00 | 16.07.2015 00:04:25 | 16.07.2015 04:25:30 | 0 00:00:05.450000 |
Regrettably, this approach is limited to cases without CPU second filters for the QueryLog. In such scenarios, most of the “BEGIN LOADING” statements will be absent, rendering any evaluation inaccurate.
Here is a link to the official Teradata documentation for NOPI tables:
SELECT
U.LOGDATE,
U.USERNAME,
U.UTILITYNAME,
count(*) as Utility_Count
FROM
PDCRINFO.DBQLUtilityTbl_HST U
WHERE LOGDATE > DATE-7
AND USERNAME = ‘service_abc_xyz_Etl’
and UTILITYNAME in
(
‘FASTEXP’,
‘FASTLOAD’,
‘TPTLOAD’,
‘MULTLOAD’,
‘TPTUPD’,
‘TPTEXP’
)
GROUP BY 1,2,3
;