Teradata offers multiple methods for external data loading, which can be categorized into two groups.
The bulk load utilities facilitate the accelerated data transfer to the target table by circumventing the transient journal. At the same time, the transactional utilities involve using the transient journal, which results in a slower process.
Every type has its own set of pros and cons. You can load your data in either manner based on the specific needs.
Traditionally, individual load utilities offered varying levels of usability and syntax. However, Teradata now offers a universal tool, the Teradata Parallel Transporter (TPT), that integrates the functionality of all these distinct instruments. Despite this, specialized tools remain available for use.
TPT now boasts many useful features, yet its scripting language remains inconvenient.
The Teradata Utilities for loading
The utilities for transactional load are BTEQ and TPump, while Fastload and Multiload are known as bulk load utilities as they bypass the transient journal.
This article does not aim to teach syntax for these tools. Refer to the Official Teradata Documentation for syntax information. The purpose of this article is to provide guidance on the appropriate use of each tool.
Fastload and Multiload are highly efficient load utilities because they load data in 64-kilobyte blocks. In contrast, BTEQ and TPump insert each row separately, providing transaction handling with a rollback function.
Bulk load utilities have certain restrictions due to their lack of transaction logs.
- Indexes, triggers, and referential integrity rules must be removed before loading.
- Fastload requires the target table to be empty.
To load the set tables, a duplicate row check is necessary. However, duplicate row checks cannot be performed since data is loaded in blocks. Teradata addresses this issue by filtering rows before the database entry.
Multiload is a versatile utility that can perform various tasks beyond loading, including UPDATEs, DELETEs, and UPSERTs. Unlike Fastload, Multiload utilizes temporary tables, resulting in lower load performance. However, Multiload offers the advantage of not requiring an empty target table.
Initially, BTEQ was Teradata’s primary load utility. It remains beneficial for loading small data quantities but lacks several key features, such as advanced deadlock handling and the ability to accommodate multiple load sessions. TPUMP, which incorporates these functionalities more effectively, was subsequently developed as a solution.
The required load utility is determined primarily by the number of source rows and the byte consumption per data row.
Fastload and Multiload are recommended for handling large amounts of data, while BTEQ and TPump are better suited for smaller amounts. TPump is particularly useful for active warehousing as it allows for controlled data loading at a specific rate (measured in rows per second).
I use BTEQ for small data loads, TPUMP for moderate loads up to 500,000 records, and Fastload or Multiload for larger loads exceeding 500,000 records. However, it’s important to note that the size of the loaded rows may necessitate bulk load utilities even with smaller numbers of records.