Teradata – An Introduction to TPump

0
482

Teradata Quiz
Teradata Quiz
Price: Free

What is TPump?

TPump stands for Teradata Parallel Data Pump.

TPump does not move data in large blocks like Fastload or Multiload. Instead, it loads one row at a time, using row hash locks. Using row hash locks allows TPump to do many concurrent INSERTs and UPDATEs on a table.

TPump is not designed to bulk load huge amounts of data at once (and as fast as possible) but allows a trickle-feed of data to flow into the database.

The TPump Features

Real-time loading: Transactional systems need to get the data as soon as possible into the data warehouse. TPump allows performing near real-time updates from source systems into Teradata.

Throttling: When loading with TPump you can define how many updates should be done per minute. This is called the statement rate.

The statement rate can be even changed during the execution of a TPump load job. You might decide to speed up the load during the nightly batch window and slow it down (decreasing resource consumption) during the day when business users are running their reports.

DML Functionality: TPump can do DML functions (like Multiload). This includes INSERT, UPDATE and DELETE statements.

Advantages over Bulk Loading with Fastload or Multiload: TPump allows both USI and NUSI. While Fastload requires that the target table is empty, TPump can load, such as the Multiload, into a populated table.

Unlike Fastload and Multiload, row duplicates are allowed. Furthermore, you don’t have to drop triggers or referential integrity before loading.

The most important TPump Limitations

  • No concatenation of input data files is allowed.
  • TPump can’t handle aggregates, arithmetic functions or exponentiation.
  • The use of the SELECT is not allowed.
  • No more than four IMPORT commands may be used in a single load task. This means that a most, four files can be directly read in a single run.
  • TPump performance will decrease if access logging is enabled on the target objects.

Monitoring TPump

TPump comes with a tool called the TPump Monitor. This tool allows you to check the status of TPump jobs in real-time and to change the statement rate of running jobs.

You can start the monitor under LINUX with the following command:

tpumpmon [-h] [TDPID/],[,]

TPump Error Handling

TPump uses one error table per target table. The error tables stores a copy of each row which caused the error.

These are some common errors showing up in TPump loads:

2816: Failed to insert a duplicate row into TPump Target Table.

The error happens when TPump discovered a duplicate row. Only the first row will be inserted, duplicates will be discarded.

2817: Activity count greater than one for TPump UPDATE/DELETE.

This error happens when there are extra rows when TPump is attempting an UPDATE or DELETE.

2818: Activity count zero for TPump UPDATE or DELETE.

This error shows that the UPDATE or DELETE did not occur.

TPump Restartability

TPump is fully restartable as long as the log table and error tables are not dropped.

TPump Example Script

.logtable TheDatabase.tpumplog;
.logon DWHPRO,******;
database TheDatabase;
.name test;
.begin load errortable ET_CUST
sleep 5
checkpoint 5
sessions 8
errlimit 4
pack 4

tenacity 4
serialize on;

specify a PRIMAY KEY in the .FILED command */
.layout mylayout;
.field CUSTID * varchar(5) key;
.field CUSTNAME * varchar(30);
.dml label INST;
insert into CUSTOMER
(CUSTID,CUSTOMER_NAME
)
values
(:CUSTID,
:CUSTNAME
);
.import infile CUSTDATA.txt
format vartext ‘,’
layout mylayout
apply INST;
.end load;
.logoff;

Teradata Quiz
Teradata Quiz
Price: Free

The Fundamentals of FastLoading on Teradata

 

Our Reader Score
[Total: 4    Average: 3/5]
Teradata – An Introduction to TPump written by Roland Wenzlofsky on June 1, 2017 average rating 3/5 - 4 user ratings

LEAVE A REPLY

Please enter your comment!
Please enter your name here