fbpx

TPT Teradata – Introduction

Those of us who are working for many years with Teradata and its utilities know that the Teradata Parallel Transporter Utility (TPT) merges the functionalities of Fastload, Multiload, TPUMP, BTEQ, and Fastexport into one utility.

Teradata attempted to create a common tool in the past, but I assume many of us did not even notice it, as it was an epic fail, called “The Teradata Warehouse Builder”.

I can’t remember anybody ever using it at all. Nevertheless, scripts written in Teradata Warehouse Builder are executable with TPT without any changes.

While the syntax of the standalone utilities is not consistent, TPT offers a uniform syntax for all loading, update, delete and extract tasks (and some other tasks, like DDL statements, execution of Linux shell scripts, etc.)

The base of TPT for Teradata is the concepts of data streams and operators.

Data streams are not directly accessible from your scripts. They are the pipelines between operators, and they are kept in memory. No data is written to the disks.

Operators read data from a source (which could be a data stream or any other valid source like a flat-file or ODBC connection) or write data to a target (which again can be a data stream or any valid target like a table or a flat file). Some operators take over more tasks, such as dropping and creating tables. We will describe each type of operator in detail later in this article.

Without TPT, we would probably be using Linux pipes to make in-memory pipelining between files and the load utilities.

For example, one shell script could be writing a flat file into a named pipe, while at the same time a Fastload script would be reading from this named pipe:

cat thefile > named_pipe &
cat named_pipe | fastload

Although TPT uses naming conventions and concepts different from the standalone tools, we can most easily find an equal TPT operator for each traditional tool; TPT combines all standalone tools and more features.

TPT Teradata – Operators Overview

TPT operators are grouped into producer operators (read operators), filter operators, and consumer operators (write operators).

Producer operators read data from various data sources and make them available in a data stream for consumer operators (by reading from flat files, ODBC sources, SQL select statements, export SQL).

Consumer operators read data from a data stream and write it to a target table or a flat-file.

Producer operators and consumer operators can use access modules. Access modules are software modules used to read from data stores such as CD, DVD, and tape drives.

The following access modules are available: Named pipes (for reading from Unix named pipes), WebSphere MQ (for reading from IBM message queues), JMS.  The user can implement additional access modules.

For a Teradata beginner, it isn’t easy to distinguish between consumers and producers. Maybe the easiest way to differentiate is to memorize the following:

Producer operators never write into the target, only into a data stream. Consumer operators never write into a data stream, only directly into a target.

The data streams connect operators (standalone operators we will cover later):

TPT_Simple

As you can see in the above picture, TPT covers the complete ETL chain.

The table below shows how TPT replaces the most used standalone utilities:

TPT operatorStandalone utilityTask
DDL operatorBTEQExecutes DDL, DCL, and self-contained DML SQL statements
Export operatorFastExportExports data from Teradata
Load operatorFastLoadLoads an empty table in block mode
ODBC operatorOLE DB Access ModuleExports data from ODBC data source
OS Command operator.OS command in BTEQExecutes Linux commands
SQL Inserter operatorBTEQTransactionally inserts data into a Teradata table
SQL Selector operatorBTEQSQL SELECT from Teradata
Stream operatorTpumpTransactionally loads Teradata tables
Update operatorMultiLoadUpdates, inserts, and deletes rows

The most crucial difference between the standalone utilities and TPT is the level of parallelism. Traditionally the utilities have been strictly used sequentially. TPT offers parallelism by sharing the data streams and running of several operators in parallel:

TPT_Parallel

Probably the most significant difference between the standalone utilities and TPT is the level of parallelism. Traditionally, the utilities have been strictly used sequentially. TPT offers parallelism by sharing the data streams, and the possibility to run several operators in parallel:

In the above example, two producer operators are running parallel, writing into a typical data stream. Simultaneously, two consumer operators are reading in parallel from the data stream and writing into the Teradata database table. Such a setup would need a lot of programming (Linux shell scripts, etc.) if implemented with the standalone utilities.
We will now get more into detail by showing you which operators exist and how the standalone tools made it into TPT operators:

tpt teradata

The Producer Operators

Producer operators read data from a valid data source and make it available for consumer operators in a data stream.

The Data Connector Operator (DATACONNECTOR PRODUCER):

The Data Connector Operator is a two-way operator,  either used as a producer operator or a consumer operator.

When the type is DATACONNECTOR PRODUCER, it’s a producer operator and used to read data from flat files or an access module, pushing the data into a data stream.

It can read from a single flat file (similar to the file=”filename” statement in a Fastload). Furthermore, all files of a specific directory matching a wildcard pattern can be read at once (i.e., treated as a single input file).

Apart from directly reading from flat files, INMOD adapters can be used to push the data to the consumer operator (Fastload INMOD and Multiload INMOD).

The Export Operator (EXPORT):

This operator replaces the Fastexport utility. It reads data from a Teradata table (using a SQL SELECT statement) and pushes it into a data stream: It’s a producer operator. It puts the data into a data stream, not directly into a flat-file!

The SQL Selector Operator (SELECTOR):

This operator produces data by executing a SQL SELECT statement. The data are written into a data stream. This operator is comparable to the BTEQ export.

The ODBC Operator (ODBC):

The ODBC operator produces data by reading from an ODBC data source and writing it into a data stream.

Consumer Operators

Consumer operators read data from a data stream and write it into a target, either a table or a flat-file. Access modules are also valid targets. Consumer operators read from data streams and write to targets.

Consumer operators correspond with their standalone utilities.

The Data Connector Operator (CONSUMER):
When defined as a consumer type, the Data Connector Operator is used to write into a flat-file. Even access modules are utilized as the target.

The Load  Operator (LOAD):
This operator offers the block level load functionality we can find in a Fastload.
The Update Operator (UPDATE):
This operator provides the enhanced block-level update functionality we can find in a Multiload.

The Stream Operator (STREAM):
This operator implements TPUMP functionality.

The SQL Inserter Operator (INSERTER):
This operator performs the transactional BTEQ INSERT functionality.

The Fastexport OUTMOD (FASTEXPORT OUTMOD):
This operator allows for usage of the Fastexport OUTMOD adapter.

Filter Operators

We use this operator to apply filtering on the data stream.

The TPT scripts allow us to invoke user-written filters (C operator, C++), WHERE clauses, and CASE DML expressions in APPLY statements.

Standalone Operators

The OS Command Operator (OS COMMAND):

We operator to execute Linux commands. It replaces the functionality BTEQ offers with the “.OS” command

The DDL Operator (DDL):

We use this operator to execute DDL statements. It’s useful for tasks, such as dropping or creating tables and indexes before the data load occurs.

While this operator’s name might be somehow misleading, it allows for any SQL statements that don’t return a result set.

We can use, for example statements like INSERT…SELECT, UPDATE, and DELETE.

The Update Operator (UPDATE):
When used for deletion, it’s the replacement for an optimized Multiload DELETE.

TPT is tightly coupled to the standalone utilities, offering more functionality on top of the other tools Fastload, Multiload, and Bteq. 

Example scripts that help you to start with TPT can be found here:

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • Avatar
    venkatesh says:

    its really informative..a working example could have made it even better

    • Roland Wenzlofsky
      Roland Wenzlofsky says:

      There is a link in the article now pointing to example scripts.

  • Avatar
    bob mierzwinski says:

    Very nice intro to a set of powerful and oftentimes, not used enough, functionality. Very nicely done

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >