Teradata FastloadWhat is a FastLoad?

FastLoad is known for its speed in loading massive amounts of data from flat files into a table.

The FastLoad utility achieves this, by skipping the transient journal, which requires the target table to be empty. If the target table has data, you would need to use MultiLoad instead. It can handle populated tables but will be slower than the FastLoad.

How FastLoad works

There are some other optimizations, apart from skipping the transient journal, which make FastLoad perform so well.

The FastLoad utility assembles data into 64KB blocks before sending it to the database and can use multiple sessions at the same time, taking advantage of the parallel architecture.

Block level loading is a huge advantage over BTEQ and TPump, which load data row by row, using the transient journal.

On smaller Teradata Systems, Fastload will use one session per AMP to maximize parallel processing. Systems with a massive amount of AMPs usually limit the number of parallel Fastload sessions.

FastLoad Limitations

FastLoad comes with several limitations; we have to consider in our load design.

  1. No Secondary Indexes are allowed on the Target Table. Secondary indexes (USI, NUSI) are stored in sub-tables, the unique secondary index rows (USI) even on a different AMP than the data row. Maintaining secondary indexed would slow down the load experience. Therefore, secondary indexes are not supported.If you need secondary indexes, drop them before loading, and recreate them afterward. Not a big deal.
  2. No referential integrity on the target table. Similarly to secondary indexed, checking referential integrity between tables would slow down the load. FastLoad is designed to load a table in the fastest way possible, not to ensure referential integrity.If you need referential integrity, drop the constraints before loading, and add them afterward again.
  3. No triggers. Usually, triggers are used to maintain other tables. Furthermore, a trigger typically uses more than one AMP. If you need the trigger, disable it before loading, and enable it again afterward.
  4. No Duplicate Rows. Duplicate Rows are not supported. When the target table of a Fastload is a MULTISET table, duplicate rows are discarded from the input data. The duplicate row check would be too expensive and negatively impact load performance.
  5. All AMPs have to be up. FastLoad requires all AMPs to be up. If one or more AMPs go down, they have to be restarted, before the load can be continued.
  6. Only one level of data type conversion. For each target column, only one data type conversion is allowed, as this is a very expensive process.

The FastLoad Phases

FastLoad consists of two phases. Incoming data is evenly distributed across the AMPs.

The Acquisition Phase

In the acquisition phase, the Fastload transfers data from the host to the AMPs. For maximum performance, the parsing engine is not hashing each table row by the primary index columns.

It only will open sessions from the FastLoad client to the AMPs. The number of sessions recommended depends on the size of the system, and can be defined with the SESSION command.

All client sessions start sending blocks of 64KB to the AMPs. These large data blocks guarantee fast load performance.

Data blocks are distributed in a round-robin fashion to the AMPs. This means that usually, rows are not ending up on the target AMP (which is defined by the table’s primary index).

Each AMP will, therefore, hash the received rows by the primary index columns, and send them to the correct AMP. In this case, the AMPs are doing the hashing, which is usually done by the parsing engine. Rows will be located on the right AMP, but in a worktable, not sorted by row hash.

The Application Phase

When all AMPs are done receiving data blocks, the second phase of the Fastload starts, the Application Phase.

In this phase, each AMP sorts the rows in its worktable and puts them into the target table.

Restartability of FastLoads

We may need to restart a Fastload for several reasons. The Teradata system may be restarted in the middle of the load, or the filesystem containing the loaded data went down. A restarted Fastload will continue where it was interrupted.

FastLoads can’t be restarted in the following situations:

  • The error tables have been dropped
  • The target table was dropped
  • The target table is recreated

FastLoads can be restarted in the following situations:

  • The error tables have not bee dropped AND
  • The target table is not dropped AND
  • The target table is not created in the script AND
  • Checkpoints have been defined

An Example FastLoads Script

SESSIONS 16;
RECORD 2 THRU 1000;
ERRLIMIT 5;
LOGON SYS1/dwhpro,dwhpro123

DROP TABLE TheTable;
DROP TABLE TheTableError1;
DROP TABLE TheTableError2;

CREATE TABLE TheTable
(
PK INTEGER,
Age INTEGER,
ColA CHAR(10)
)  PRIMARY INDEX(PK);

DEFINE PK (INTEGER), Age (INTEGER), ColA (CHAR(10))

FILE=TheFile;
SHOW;
BEGIN LOADING TheTable ERRORFILES TheTableError1,TheTableError2
CHECKPOINT 10000;

INSERT INTO TheTable (PK, Age, ColA) VALUES  (:PK, :Age, :ColA);

END LOADING;
LOGOFF;

Questions?
If you have any questions about all this, please ask in the comments! I’ll be paying close attention and answering as many as I can. Thank you for reading. Whatever this blog has become, I owe it all to you. Don’t forget to rate this article with 5 stars if you like it!
Our Reader Score
[Total: 11    Average: 4.4/5]
The Fundamentals of FastLoading on Teradata written by Roland Wenzlofsky on February 27, 2017 average rating 4.4/5 - 11 user ratings

6 COMMENTS

  1. Hello Roland,

    While trying to use FastLoad simple text file with three columns, I’m getting this error:
    I/O Error on File Checkpoint: 42, Text: Unable to obtain data signature
    Would you have any idea what is causing this error? I have modified your script to adjust to my particular circumstances and prepared simple three column text file with data type as integer, varchar (30) and char (5). I used Notepad++ as a text editor to create my text file. It would be very nice of you if could point out what is wrong.

    regards,
    Dominik

  2. Hello,

    Just small addition.
    Sometimes it is good to run a FastLoad against non-PI table. In this case the redistribution operation at Application phase may be skip and by this we will have the fastest way to write a new data into teradata.

    Best regards,
    Aleksei.

  3. Yes, this is correct. In the acquisition phase, data blocks are moved “round-robin” from the client session to an AMP, which is usually not the target AMP. In this phase, it’s all about getting data blocks fast to the database. Each AMP disassembles the data blocks it receives and hashes the primary index columns of the contained rows to find out the correct AMP for each row and sends it over to this AMP.

    Moving 64KB blocks without checking for the right AMP is much faster than making this decision at the begin on a row by row basis.

  4. Hello Roland,
    Thank you for the nice explanation about FASTLOAD. I have a question about the phases if you don’t mind.

    During the acquisition phase, it says “rows are not ending up on the target AMP “. In the next paragraph, it says, ” Each AMP will, therefore, hash the received rows by the primary index columns, and send them to the correct AMP”.

    I am bit confused here. Does it mean that the rows are not loaded into the correct AMP during the acquisition phase? Can you please explain a bit more here.

LEAVE A REPLY

Please enter your comment!
Please enter your name here