What is a Teradata FastLoad?
Fastload is a highly efficient Teradata Bulk Load Utility designed to rapidly load vast amounts of data. It bypasses the transient journal and loads data in bulk as blocks. The utility first compiles the client-side data blocks and then transmits them to Teradata, thereby minimizing network load. Compared to BTEQ or TPump, loading data blocks instead of row by row is a far more effective approach for larger tables. Teradata can automatically establish the optimal number of parallel sessions based on the table size and settings. Reducing the number of I/Os can enhance Fastload’s performance, which Multi-Value Compression accomplishes easily.
When should the Teradata Fastload be used?
Due to their high resource consumption, Teradata restricts the number of bulk load utilities that can run concurrently. For instance, the acquisition phase of FastLoad necessitates three AMP Worker Tasks (AWT), while the APPLY phase necessitates one AWT. Since AWTs are a scarce resource and can lead to AMP-blocking situations, each AMP limits the proportion of AWTs allocated to bulk load utilities. The number varies depending on the system and typically falls within the lower double-digit range.
Teradata FastLoad is the most rapid method to import data in large quantities. However, it can cause delays in processing and become a bottleneck. As part of fine-tuning, the loading environment must be examined to determine which data should be loaded through bulk load utilities. It may be feasible to transfer loads to transactional ones, such as BTEQ or TPump. To affirm the initial assumptions, this assessment must be periodically reevaluated.
During my time as a Teradata expert, I have encountered numerous instances where poor utilization of fastloads has resulted in significant performance issues. It is important to note that each Teradata System possesses a defined restriction on the number of concurrent fastloads allowed.
Based on my experience, I advise against utilizing the Teradata fastload utility for loads consisting of fewer than 100,000 rows. Although this number is a general guideline, it is an effective starting point.
Transactional utilities have varying data limitations. BTEQ is ideal for loading under 100,000 rows, and TPUMP can manage up to 500,000. It is important to assess each situation independently. For example, if the number of rows is small but extensive, fast loading may still be required.
I recall a client who loaded nearly all tables, including the empty ones, with FASTLOAD. This caused significant delays in the nightly batch loads as the load jobs had to wait for available utility slots and was subsequently queued. However, after redesigning small tables to TPumo (with a predetermined limit of 100,000 rows), the average load times decreased by approximately 50%.
To analyze your load landscape efficiently, utilize DBC.DBQLOGTBL and filter rows where StatementType is “Fastload”. The “UtilityRowCount” column represents the loaded row count. Loads with a UtilityRowCount lower than 100,000 are suitable for conversion to transactional loads with BTEQ or TPUMP.
Teradata FastLoad Limitations
- No secondary indexes are allowed on the target table. Secondary indexes (USI, NUSI) are stored in sub-tables, the unique secondary index rows (USI), even on a different AMP than the data row. Maintaining secondary indexes during the Fastload would decrease load performance. If we need secondary indexes, we can drop them off before loading and recreate them afterward.
- No referential integrity on the target table is allowed. Like secondary indexed, checking referential integrity between tables would slow the load. FastLoad is designed to load a table fastest, not to ensure referential integrity. If you need referential integrity, drop the constraints before loading, and add them afterward.
- No triggers are allowed on the target table. Usually, triggers are used to maintain other tables. Furthermore, a trigger typically uses more than one AMP. If you need the trigger, disable it before loading, and enable it again afterward.
- No duplicate input rows are supported (only for NOPI tables). Duplicate Rows are not endorsed for PI tables. When the target table of a Fastload is a multiset table, duplicate rows are discarded from the input data. The duplicate row check is too expensive and negatively impacts load performance.
- All AMPs have to be up.FastLoad requires all the AMPs to be up. If one or more AMPs go down, they must be restarted before the load can be continued.
- For each target column, only one data type conversion is allowed.
The Teradata Fastload Phases
The Acquisition Phase
During acquisition, Fastload moves data from the client to the AMPs without hashing each table row by the primary index columns to optimize performance. Fastload establishes sessions from the client to the AMPs, with the recommended number of sessions depending on the system’s size and defined using the SESSION command.
Client sessions initiate data block transmission to the AMPs to optimize load performance. The Acquisition Phase’s efficacy hinges largely on the source system and network connectivity, which can increasingly constrain performance as the Teradata system grows in size and power. Using larger data blocks further improves load performance.
Data blocks are distributed among all AMPs in a round-robin manner. Rows, initially located on non-target AMPs (determined by the primary index of the table), are sent to the target AMP after the AMP calculates the PI. During fast loading, hashing is typically done by the parsing engine. The rows are located on the target AMPs by the end of the acquisition phase but remain unsorted by rowhash in a spool table.
The Application Phase
Upon the completion of data block reception by all AMPs, the second phase of Fastload commences. Each AMP meticulously arranges the rows in its spool table and then proceeds to transfer them to the target table. It is worth noting that the APPLICATION phase is not executed for NOPI tables.
The APPLICATION phase experiences a logarithmic increase as the volume of data grows. Specifically, every eightfold increase in data volume doubles both read and write input/output operations.
The duration of a fastload is the total of the acquisition and application phase runtimes.
Restartability of FastLoads
Fastload may restart due to a system outage or an inaccessible data filesystem. The process resumes from its interruption point.
FastLoads cannot be restarted in the following scenarios:
- The error tables have been dropped.
- The target table was dropped.
- The target table is recreated.
FastLoads may be restarted in the following scenarios:
- The error tables have not been dropped and
- The target table is not dropped and
- The target table is not created in the script and
- Checkpoints have been defined.
An Example FastLoads Script
SESSIONS 16;
RECORD 2 THRU 1000;
ERRLIMIT 5;
LOGON SYS1/dwhpro,dwhpro123
DROP TABLE TheTable;
DROP TABLE TheTableError1;
DROP TABLE TheTableError2;
CREATE TABLE TheTable
(
PK INTEGER,
Age INTEGER,
ColA CHAR(10)
) PRIMARY INDEX(PK);
DEFINE PK (INTEGER), Age (INTEGER), ColA (CHAR(10))
FILE=TheFile;
SHOW;
BEGIN LOADING TheTable ERRORFILES TheTableError1,TheTableError2
CHECKPOINT 10000;
INSERT INTO TheTable (PK, Age, ColA) VALUES (:PK, :Age, :ColA);
END LOADING;
LOGOFF;
Hello Roland,
While trying to use FastLoad simple text file with three columns, I’m getting this error:
I/O Error on File Checkpoint: 42, Text: Unable to obtain data signature
Would you have any idea what is causing this error? I have modified your script to adjust to my particular circumstances and prepared a simple three-column text file with data types like integer, varchar (30) and char (5). I used Notepad++ as a text editor to create my text file. It would be very nice of you if could point out what is wrong.
regards,
Dominik
Hello,
Just a small addition.
Sometimes it is good to run a FastLoad against a non-PI table. In this case, the redistribution operation at the Application phase may be skipped and by this, we will have the fastest way to write new data into Teradata.
Best regards,
Aleksei.
Thanks for Shangri. What a pleasure to read!
Yes, this is correct. In the acquisition phase, data blocks are moved “round-robin” from the client session to an AMP, which is usually not the target AMP. In this phase, it’s all about getting data blocks fast to the database. Each AMP disassembles the data blocks it receives and hashes the primary index columns of the contained rows to find out the correct AMP for each row and sends it over to this AMP.
Moving 64KB blocks without checking for the right AMP is much faster than making this decision at the begin on a row by row basis.
Thanks for the explanation, Roland. I understand it now.
Hello Roland,
Thank you for the nice explanation about fastload. I have a question about the phases if you don’t mind.
During the acquisition phase, it says “rows are not ending up on the target AMP “. In the next paragraph, it says, ” Each AMP will, therefore, hash the received rows by the primary index columns, and send them to the correct AMP”.
I am a bit confused here. Does it mean that the rows are not loaded into the correct AMP during the acquisition phase? Can you please explain a bit more here.