fbpx

The Teradata Fastload Utility

By Roland Wenzlofsky

August 25, 2020


What is a Teradata FastLoad?

Fastload is a Teradata Bulk Load Utility used to load vast amounts of data as quickly and efficiently as possible. As a bulk load utility, it bypasses the transient journal and loads data in blocks. The Fastload utility first assembles the client-side data blocks and then sends them to Teradata, minimizing network load. Loading data blocks instead of row by row are much more efficient for large tables than loading with BTEQ or TPump. Depending on the table and settings’ size, the optimal number of parallel sessions can be determined automatically by Teradata. The performance of a fastload can be increased by reducing the number of I/Os. Multi Value Compression, for example, is suitable for this.

When should the Fastload be used?

Teradata allows only a limited number of bulk load utilities to execute at the same time, as they consume a lot of resources. For example, a FastLoad needs three AMP Worker Tasks (AWT) during the acquisition and one AWT during the apply phases. AWT is a scarce AMP resource and can cause AMP blocking situations. To address this issue, each AMP limits the percentage of AWTs that can be assigned to bulk load utilities. The exact number is system-dependent and typically lies in the lower double-digit area.

In this context, despite FastLoad being the fastest option to import large amounts of data, it can become a bottleneck and cause processing delays. As part of the tuning process, the “loading landscape” must be analyzed to identify which data should be loaded via bulk load utilities. It might be possible to offload loads to transactional ones (i.e., BTEQ or TPump). Such an analysis must be periodically reassessed to validate that the original assumptions still hold.

In my career as Teradata professional, I have seen a lot of performance problems caused by the inappropriate use of fastloads. Each Teradata System has a sepcific limit on the number of parallel fastloads.

From experience I would not recommend using the Teradata fastload utility for loads of less than 100.000 rows; 100.000 rows are just a rule of thumb, but a good starting point.

Smaller amount of data can be loaded with the transactional utilities. BTEQ performs good for less than 100.000 rows, TPUMP for up to 500.000 rows; of course this has to be evaluated case by case; for example very wide rows could be a reason to use fast loading even if the number of rows is low.

I remember one client who was loading almost every single table with fast load, even the empty ones. As a result, nightly batch loads were heavily delayed as load jobs had to wait for free utility slots and got queued. After a redesign, switching small tables to TPumo  (with an arbitrarily chosen limit of 100.000 rows) , the average oad times decreased by about 50%!

The easiest way to analyze your load landscape is to use DBC.DBQLOGTBL. You can query the rows where column StatementType equals “Fastload”. The column “UtilityRowCount” is the number of rows which has been loaded. All loads with a UtilityRowCount lass than 100.000 rows are candidates for being converted to transactional loads with BTEQ or TPUMP.

FastLoad Limitations

  • No secondary indexes are allowed on the target table. Secondary indexes (USI, NUSI) are stored in sub-tables, the unique secondary index rows (USI), even on a different AMP than the data row. Maintaining secondary indexes during the Fastload would decrease load performance. If we need secondary indexes, we can drop them before loading and recreate them afterward.
  • No referential integrity on the target table allowed. Similarly to secondary indexed, checking referential integrity between tables would slow down the load. FastLoad is designed to load a table in the fastest way possible, not to ensure referential integrity. If you need referential integrity, drop the constraints before loading, and add them afterward again.
  • No triggers are allowed on the target table. Usually, triggers are used to maintain other tables. Furthermore, a trigger typically uses more than one AMP. If you need the trigger, disable it before loading, and enable it again afterward.
  • No duplicate input rows are supported (only for NOPI tables). Duplicate Rows are not supported for PI tables. When the target table of a Fastload is a multiset table, duplicate rows are discarded from the input data. The duplicate row check is too expensive and negatively impacts load performance.
  • All AMPs have to be up.FastLoad requires all the AMPs to be up. If one or more AMPs go down, they must be restarted before the load can be continued.
  • For each target column, only one data type conversion is allowed.

The Teradata Fastload Phases

The Acquisition Phase

In the acquisition phase, the Fastload transfers data from the client to the AMPs. For maximum performance, the parsing engine is not hashing each table row by the primary index columns. It opens sessions from the FastLoad client to the AMPs. The number of sessions recommended depends on the system’s size and defined with the SESSION command.

All client sessions start sending data blocks to the AMPs; Larger data blocks ensure better load performance. The performance of the Acquisition Phase is highly dependent on the source system and network connectivity. These two factors become more restrictive the larger and more powerful the Teradata system is.

The data blocks are distributed in a round-robin fashion to all AMPs. As rows are initially not on their target AMP (defined by the table’s primary index), each AMP will calculate the PI of the received rows and send them to the the target AMP.  When fast loading, the AMPs are doing the hashing, which is usually done by the parsing engine. At the end of the acquisition phase, the rows are located on the target AMPs, but in a spool table,  still not being sorted by the rowhash.

The Apply Phase

When all AMPs are done receiving data blocks, the second phase of the Fastload starts. Each AMP sorts the rows in its spool table and copies them into the target table. The Apply phase is not done for NOPI tables.

The apply phase increases in time logarithmically as the data volume increases; each time the data volume increases eightfold, the number of read and write I/Os doubles.

The time required for a fastload is the sum of the runtimes for the acquisition phase and the apply phase.

Restartability of FastLoads

A Fastload can be restarted for several reasons. The Teradata system might have been gone down, or the filesystem containing the data went is not reachable. A restarted Fastload continues where it was interrupted.

FastLoads can’t be restarted in the following situations:

  • The error tables have been dropped.
  • The target table was dropped.
  • The target table is recreated.

FastLoads can be restarted in the following situations:

  • The error tables have not bee dropped and
  • The target table is not dropped and
  • The target table is not created in the script and
  • Checkpoints have been defined.

An Example FastLoads Script

SESSIONS 16;
RECORD 2 THRU 1000;
ERRLIMIT 5;
LOGON SYS1/dwhpro,dwhpro123

DROP TABLE TheTable;
DROP TABLE TheTableError1;
DROP TABLE TheTableError2;

CREATE TABLE TheTable
(
PK INTEGER,
Age INTEGER,
ColA CHAR(10)
)  PRIMARY INDEX(PK);

DEFINE PK (INTEGER), Age (INTEGER), ColA (CHAR(10))

FILE=TheFile;
SHOW;
BEGIN LOADING TheTable ERRORFILES TheTableError1,TheTableError2
CHECKPOINT 10000;

INSERT INTO TheTable (PK, Age, ColA) VALUES  (:PK, :Age, :ColA);

END LOADING;
LOGOFF;

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • Hello Roland,

    While trying to use FastLoad simple text file with three columns, I’m getting this error:
    I/O Error on File Checkpoint: 42, Text: Unable to obtain data signature
    Would you have any idea what is causing this error? I have modified your script to adjust to my particular circumstances and prepared a simple three-column text file with data types like integer, varchar (30) and char (5). I used Notepad++ as a text editor to create my text file. It would be very nice of you if could point out what is wrong.

    regards,
    Dominik

  • Avatar
    Aleksei Svitin says:

    Hello,

    Just a small addition.
    Sometimes it is good to run a FastLoad against a non-PI table. In this case, the redistribution operation at the Application phase may be skipped and by this, we will have the fastest way to write new data into Teradata.

    Best regards,
    Aleksei.

    • Avatar
      Christiana says:

      Thanks for Shangri. What a pleasure to read!

  • Yes, this is correct. In the acquisition phase, data blocks are moved “round-robin” from the client session to an AMP, which is usually not the target AMP. In this phase, it’s all about getting data blocks fast to the database. Each AMP disassembles the data blocks it receives and hashes the primary index columns of the contained rows to find out the correct AMP for each row and sends it over to this AMP.

    Moving 64KB blocks without checking for the right AMP is much faster than making this decision at the begin on a row by row basis.

    • Avatar
      Jayant Singh says:

      Thanks for the explanation, Roland. I understand it now.

  • Hello Roland,
    Thank you for the nice explanation about fastload. I have a question about the phases if you don’t mind.

    During the acquisition phase, it says “rows are not ending up on the target AMP “. In the next paragraph, it says, ” Each AMP will, therefore, hash the received rows by the primary index columns, and send them to the correct AMP”.

    I am a bit confused here. Does it mean that the rows are not loaded into the correct AMP during the acquisition phase? Can you please explain a bit more here.

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >