Teradata TPump

Roland Wenzlofsky

September 12, 2020

minutes reading time


What is Teradata TPump?

TPump stands for Teradata Parallel Data Pump.

Teradata TPump does not move data in large blocks like Fastload or Multiload. Instead, it loads one row at a time, using row hash locks. Row hash locks allow TPump to do many concurrent INSERTs and UPDATEs on a table.

Teradata TPump is not designed to bulk-load vast amounts of data at once (and as fast as possible) but allows a trickle-feed of data to flow into the database.

The Teradata TPump Features

Real-time loading: Transactional systems need to get the data as soon as possible into the data warehouse. TPump allows performing near real-time updates from source systems into Teradata.

Throttling: When loading with TPump, you can define how many updates should be done per minute. This is called the statement rate.

We can even change the statement rate during the TPump load job execution. You might decide to speed up the load during the nightly batch window and slow it down (decreasing resource consumption) during the day when business users run their reports.

DML Functionality: TPump can do DML functions (like Multiload). This includes INSERT, UPDATE, and DELETE statements.

Advantages over Bulk Loading with Fastload or Multiload: TPump allows USI and NUSI. While Fastload requires that the target table be empty, TPump can load into a populated table, such as the Multiload.

Unlike Fastload and Multiload, row duplicates are allowed. Furthermore, you don’t have to drop triggers or referential integrity before loading.

The most critical TPump Limitations

  • No concatenation of input data files is allowed.
  • TPump can’t handle aggregates, arithmetic functions, or exponentiation.
  • The use of SELECT is not allowed.
  • No more than four IMPORT commands may be used in a single load task. We can directly read most four files in a single run.
  • TPump performance will decrease if access logging is enabled on the target objects.

Monitoring TPump

Teradata TPump comes with a tool called the TPump Monitor. This tool allows you to check the status of TPump jobs in real-time and change the statement rate of running jobs.

You can start the monitor under LINUX with the following command:

tpumpmon [-h] [TDPID/],[,]

TPump Error Handling

TPump uses one error table per target table. The error tables store a copy of each row, which caused the error.

These are some common errors showing up in TPump loads:

2816: Failed to insert a duplicate row into the TPump Target Table.

The error happened when TPump discovered a duplicate row. Teradata will insert only the first row; duplicates will be discarded.

2817: Activity count greater than one for TPump UPDATE/DELETE.

This error happens when there are extra rows when TPump attempts an UPDATE or DELETE.

2818: Activity count zero for TPump UPDATE or DELETE.

This error shows that the UPDATE or DELETE did not occur.

TPump Restartability

TPump is fully restartable as long as the log table, and error tables are not dropped.

TPump Example Script

.logtable TheDatabase.tpumplog;
.logon DWHPRO,******;
database TheDatabase;
.name test;
.begin load errortable ET_CUST
sleep 5
checkpoint 5
sessions 8
errlimit 4
pack 4
tenacity 4
serialize on;
specify a PRIMAY KEY in the .FILED command */
.layout mylayout;
.field CUSTID * varchar(5) key;
.field CUSTNAME * varchar(30);
.dml label INST;
insert into CUSTOMER
(CUSTID,CUSTOMER_NAME
)
values
(:CUSTID,
:CUSTNAME
);
.import infile CUSTDATA.txt
format vartext ','
layout mylayout
apply INST;
.end load;
.logoff;

The Fundamentals of FastLoading on Teradata

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

You might also like

>