Teradata TPump: What it is and How it Facilitates Real-Time Data Updates

Roland Wenzlofsky

April 20, 2023

minutes reading time

What is Teradata TPump?

TPump is short for Teradata Parallel Data Pump.

Teradata TPump loads data one row at a time, utilizing row hash locks to facilitate numerous concurrent INSERTs and UPDATEs on a table. This approach differs from Fastload or Multiload, which transfer data in large blocks.

Teradata TPump is not intended for loading large volumes of data rapidly. Instead, it facilitates a steady stream of data to enter the database.

The Teradata TPump Features

Transactional systems require rapid data transfer to the data warehouse. TPump facilitates near real-time updates from source systems to Teradata.

Throttling: TPump enables you to establish a statement rate for data updates during loading, indicating the desired frequency of updates per minute.

The statement rate can be adjusted during TPump job execution. It may be expedient to accelerate the load process during nightly batch processing and decelerate it (thus reducing resource utilization) when business users generate their reports during the day.

DML Functionality: TPump can do DML functions (like Multiload). This includes INSERT, UPDATE, and DELETE statements.

TPump offers advantages over bulk loading with Fastload or Multiload by allowing for the use of USI and NUSI. Unlike Fastload, which requires the target table to be empty, TPump can load data into an already populated table, including tables using Multiload.

Row duplicates are permissible in contrast to Fastload and Multiload. Additionally, dropping triggers or referential integrity is not necessary when loading.

The most critical TPump Limitations

No concatenation of input data files is allowed.
TPump can’t handle aggregates, arithmetic functions, or exponentiation.
The use of SELECT is not allowed.
A single load task may use no more than four IMPORT commands. We can directly read most four files in a single run.
TPump performance will decrease if access logging is enabled on the target objects.

Monitoring TPump

The TPump Monitor tool, included with Teradata TPump, enables real-time monitoring of TPump job statuses and permits modification of statement rates for active jobs.

To initiate the monitor on LINUX, use the command:

tpumpmon [-h] [TDPID/],[,]

TPump Error Handling

Each target table in TPump has its own dedicated error table that retains a duplicate of every erroneous row.

Common TPump load errors include:

2816: Failed to insert a duplicate row into the TPump Target Table.

TPump encountered a duplicate row error, causing Teradata to insert the initial record solely while discarding the duplicates.

2817: Activity count greater than one for TPump UPDATE/DELETE.

TPump encounters this error when it attempts to UPDATE or DELETE additional rows.

2818: Activity count zero for TPump UPDATE or DELETE.

This indicates that the UPDATE or DELETE operation failed.

TPump Restartability

TPump can be restarted completely, provided the log and error tables have not been dropped.

TPump Example Script

.logtable TheDatabase.tpumplog;
.logon DWHPRO,******;
database TheDatabase;
.name test;
.begin load errortable ET_CUST
sleep 5
checkpoint 5
sessions 8
errlimit 4
pack 4
tenacity 4
serialize on;
specify a PRIMAY KEY in the .FILED command */
.layout mylayout;
.field CUSTID * varchar(5) key;
.field CUSTNAME * varchar(30);
.dml label INST;
insert into CUSTOMER
(CUSTID,CUSTOMER_NAME
)
values
(:CUSTID,
:CUSTNAME
);
.import infile CUSTDATA.txt
format vartext ','
layout mylayout
apply INST;
.end load;
.logoff;

The Fundamentals of FastLoading on Teradata