What is a FastLoad?
FastLoad is known for its speed in loading massive amounts of data from flat files into a table.
The FastLoad utility achieves this, by skipping the transient journal, which requires the target table to be empty. If the target table has data, you would need to use MultiLoad instead. It can handle populated tables but will be slower than the FastLoad.
How FastLoad works
There are some other optimizations, apart from skipping the transient journal, which make FastLoad perform so well.
The FastLoad utility assembles data into 64KB blocks before sending it to the database and can use multiple sessions at the same time, taking advantage of the parallel architecture.
Block level loading is a huge advantage over BTEQ and TPump, which load data row by row, using the transient journal.
On smaller Teradata Systems, Fastload will use one session per AMP to maximize parallel processing. Systems with a massive amount of AMPs usually limit the number of parallel Fastload sessions.
FastLoad comes with several limitations; we have to consider in our load design.
- No Secondary Indexes are allowed on the Target Table. Secondary indexes (USI, NUSI) are stored in sub-tables, the unique secondary index rows (USI) even on a different AMP than the data row. Maintaining secondary indexed would slow down the load experience. Therefore, secondary indexes are not supported.If you need secondary indexes, drop them before loading, and recreate them afterward. Not a big deal.
- No referential integrity on the target table. Similarly to secondary indexed, checking referential integrity between tables would slow down the load. FastLoad is designed to load a table in the fastest way possible, not to ensure referential integrity.If you need referential integrity, drop the constraints before loading, and add them afterward again.
- No triggers. Usually, triggers are used to maintain other tables. Furthermore, a trigger typically uses more than one AMP. If you need the trigger, disable it before loading, and enable it again afterward.
- No Duplicate Rows. Duplicate Rows are not supported. When the target table of a Fastload is a MULTISET table, duplicate rows are discarded from the input data. The duplicate row check would be too expensive and negatively impact load performance.
- All AMPs have to be up. FastLoad requires all AMPs to be up. If one or more AMPs go down, they have to be restarted, before the load can be continued.
- Only one level of data type conversion. For each target column, only one data type conversion is allowed, as this is a very expensive process.
The FastLoad Phases
FastLoad consists of two phases. Incoming data is evenly distributed across the AMPs.
The Acquisition Phase
In the acquisition phase, the Fastload transfers data from the host to the AMPs. For maximum performance, the parsing engine is not hashing each table row by the primary index columns.
It only will open sessions from the FastLoad client to the AMPs. The number of sessions recommended depends on the size of the system, and can be defined with the SESSION command.
All client sessions start sending blocks of 64KB to the AMPs. These large data blocks guarantee fast load performance.
Data blocks are distributed in a round-robin fashion to the AMPs. This means that usually, rows are not ending up on the target AMP (which is defined by the table’s primary index).
Each AMP will, therefore, hash the received rows by the primary index columns, and send them to the correct AMP. In this case, the AMPs are doing the hashing, which is usually done by the parsing engine. Rows will be located on the right AMP, but in a worktable, not sorted by row hash.
The Application Phase
When all AMPs are done receiving data blocks, the second phase of the Fastload starts, the Application Phase.
In this phase, each AMP sorts the rows in its worktable and puts them into the target table.
Restartability of FastLoads
We may need to restart a Fastload for several reasons. The Teradata system may be restarted in the middle of the load, or the filesystem containing the loaded data went down. A restarted Fastload will continue where it was interrupted.
FastLoads can’t be restarted in the following situations:
- The error tables have been dropped
- The target table was dropped
- The target table is recreated
FastLoads can be restarted in the following situations:
- The error tables have not bee dropped AND
- The target table is not dropped AND
- The target table is not created in the script AND
- Checkpoints have been defined
An Example FastLoads Script
RECORD 2 THRU 1000;
DROP TABLE TheTable;
DROP TABLE TheTableError1;
DROP TABLE TheTableError2;
CREATE TABLE TheTable
) PRIMARY INDEX(PK);
DEFINE PK (INTEGER), Age (INTEGER), ColA (CHAR(10))
BEGIN LOADING TheTable ERRORFILES TheTableError1,TheTableError2
INSERT INTO TheTable (PK, Age, ColA) VALUES (:PK, :Age, :ColA);
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you. Don’t forget to rate this article with 5 stars if you like it!
[Total: 10 Average: 4.4/5]
The Fundamentals of FastLoading on Teradata written by Roland Wenzlofsky on February 27, 2017 average rating 4.4/5 - 10 user ratings