What is TPump?
TPump stands for Teradata Parallel Data Pump.
TPump does not move data in large blocks like Fastload or Multiload. Instead, it loads one row at a time, using row hash locks. Using row hash locks allows TPump to do many concurrent INSERTs and UPDATEs on a table.
TPump is not designed to bulk load huge amounts of data at once (and as fast as possible) but allows a trickle-feed of data to flow into the database.
The TPump Features
Real-time loading: Transactional systems need to get the data as soon as possible into the data warehouse. TPump allows performing near real-time updates from source systems into Teradata.
Throttling: When loading with TPump you can define how many updates should be done per minute. This is called the statement rate.
The statement rate can be even changed during the execution of a TPump load job. You might decide to speed up the load during the nightly batch window and slow it down (decreasing resource consumption) during the day when business users are running their reports.
DML Functionality: TPump can do DML functions (like Multiload). This includes INSERT, UPDATE and DELETE statements.
Advantages over Bulk Loading with Fastload or Multiload: TPump allows both USI and NUSI. While Fastload requires that the target table is empty, TPump can load, such as the Multiload, into a populated table.
Unlike Fastload and Multiload, row duplicates are allowed. Furthermore, you don’t have to drop triggers or referential integrity before loading.
The most important TPump Limitations
- No concatenation of input data files is allowed.
- TPump can’t handle aggregates, arithmetic functions or exponentiation.
- The use of the SELECT is not allowed.
- No more than four IMPORT commands may be used in a single load task. This means that a most, four files can be directly read in a single run.
- TPump performance will decrease if access logging is enabled on the target objects.
TPump comes with a tool called the TPump Monitor. This tool allows you to check the status of TPump jobs in real-time and to change the statement rate of running jobs.
You can start the monitor under LINUX with the following command:
tpumpmon [-h] [TDPID/],[,]
TPump Error Handling
TPump uses one error table per target table. The error tables stores a copy of each row which caused the error.
These are some common errors showing up in TPump loads:
2816: Failed to insert a duplicate row into TPump Target Table.
The error happens when TPump discovered a duplicate row. Only the first row will be inserted, duplicates will be discarded.
2817: Activity count greater than one for TPump UPDATE/DELETE.
This error happens when there are extra rows when TPump is attempting an UPDATE or DELETE.
2818: Activity count zero for TPump UPDATE or DELETE.
This error shows that the UPDATE or DELETE did not occur.
TPump is fully restartable as long as the log table and error tables are not dropped.
TPump Example Script
.begin load errortable ET_CUST
specify a PRIMAY KEY in the .FILED command */
.field CUSTID * varchar(5) key;
.field CUSTNAME * varchar(30);
.dml label INST;
insert into CUSTOMER
.import infile CUSTDATA.txt
format vartext ‘,’