Teradata has several ways of external loading data. The Teradata load utilities can be divided into two groups:
The first group allows us to bring the data as fast as possible into the target table by bypassing the transient journal. The second group uses the transient journal but is slower.
Each group has its advantages and disadvantages. Depending on the requirements, you may choose to load your data either way.
Historically, each load utility has been a separate tool offering its level of usability and syntax. Still, now Teradata provides a standard tool, which is thought to accumulate all different instruments’ functionality: the Teradata Parallel Transporter (TPT). Nevertheless, all specialized tools still exist and can be used.
Many useful new features have been added into TPT, but the new scripting language is inconvenient.
The Teradata Utilities for Loading
The transactional load utilities are called BTEQ and TPump. The ones bypassing the transient journal are the Fastload and the Multiload. They are called bulk load utilities.
This article is not thought to teach you the syntax of these tools. Please take a look at the Offical Teradata Documentation if you are interested in the syntax. This article aims to give you advice about how and when each tool should be used.
Fastload and Multiload are the most efficient load utilities, as the data is loaded in blocks of 64 kilobytes of data. BTEQ and TPump insert each row separately, using transaction handling (offering rollback functionality).
As no transaction log is used, the following restrictions exist for bulk load utilities:
- Indexes, triggers, and referential integrity rules have to be removed before loading.
- Fastload requires the target table to be empty.
The set tables’ loading requires a duplicate row check, but as the data is loaded in blocks, no duplicate row checks can be done. Teradata solves this problem by filtering rows before they reach the database.
Multiload, the second bulk load utility, is not restricted to loading but can do additional tasks, such as UPDATEs, DELETEs, and UPSERTs. It uses temporary tables and therefore does not reach the load performance of a Fastload. The advantage over Fastloads is that the target table does not have to be empty.
BTEQ was the first Teradata load utility available, and it’s still very useful for loading small amounts of data. Unfortunately, it’s missing features like multiple load sessions and advanced deadlock handling. Therefore the TPUMP utility was introduced, which implements these features in a better way.
The decision about the required load utility is mainly based on the number of source rows and the number of bytes each data row is consuming.
In principle, the bulk loading tools Fastload and Multiload should be used for the huge amount of data, while BTEQ and TPump are intended to be used for smaller data quantities. TPump offers the possibility to load the data at a particular rate (rows per second), making it an appropriate tool for active warehousing.
Personally, I use BTEQ for loading up to a few thousand records, TPUMP for loading up to 500.000 records, and Fastloads or Multiloads for more than 500.000 records. These numbers above are just a rule of thumb, as full rows may force you to use bulk load utilities already with a smaller number of rows.