TPump stands for Teradata Parallel Data Pump.
Teradata TPump does not move data in large blocks like Fastload or Multiload. Instead, it loads one row at a time, using row hash locks. Using row hash locks allows TPump to do many concurrent INSERTs and UPDATEs on a table.
Teradata TPump is not designed to bulk load huge amounts of data at once (and as fast as possible) but allows a trickle-feed of data to flow into the database.
Real-time loading: Transactional systems need to get the data as soon as possible into the data warehouse. TPump allows performing near real-time updates from source systems into Teradata.
Throttling: When loading with TPump you can define how many updates should be done per minute. This is called the statement rate.
The statement rate can be even changed during the execution of a TPump load job. You might decide to speed up the load during the nightly batch window and slow it down (decreasing resource consumption) during the day when business users are running their reports.
DML Functionality: TPump can do DML functions (like Multiload). This includes INSERT, UPDATE and DELETE statements.
Advantages over Bulk Loading with Fastload or Multiload: TPump allows both USI and NUSI. While Fastload requires that the target table is empty, TPump can load, such as the Multiload, into a populated table.
Unlike Fastload and Multiload, row duplicates are allowed. Furthermore, you don’t have to drop triggers or referential integrity before loading.
Teradata TPump comes with a tool called the TPump Monitor. This tool allows you to check the status of TPump jobs in real-time and to change the statement rate of running jobs.
You can start the monitor under LINUX with the following command:
tpumpmon [-h] [TDPID/],[,]
TPump uses one error table per target table. The error tables stores a copy of each row which caused the error.
These are some common errors showing up in TPump loads:
2816: Failed to insert a duplicate row into TPump Target Table.
The error happens when TPump discovered a duplicate row. Only the first row will be inserted, duplicates will be discarded.
2817: Activity count greater than one for TPump UPDATE/DELETE.
This error happens when there are extra rows when TPump is attempting an UPDATE or DELETE.
2818: Activity count zero for TPump UPDATE or DELETE.
This error shows that the UPDATE or DELETE did not occur.
TPump is fully restartable as long as the log table and error tables are not dropped.
.begin load errortable ET_CUST
specify a PRIMAY KEY in the .FILED command */
.field CUSTID * varchar(5) key;
.field CUSTNAME * varchar(30);
.dml label INST;
insert into CUSTOMER
.import infile CUSTDATA.txt
format vartext ‘,’
When do you need a fire brigade most? When your house is on fire!
When is the best time to establish a fire brigade?
No, not when your house is on fire, but before the city the house is located in, is founded.
Finally, when should there be a fire safety inspection?
Before and while you construct and dwell the building, not after the fact, when entire wings have to be teared down and reconstructed.
The same is true for performance specialists and Data Warehouses.
Performance Specialists are a good fit for your Data Warehouse team at the inception of a data warehouse project and at any later stage.
This is not lobbying, but an insight from years of experience in Data Warehousing.
When you think that engaging such a specialist is too costly before you can justify his or her presence with pressing real-life problems, you are essentially saying that you are willing to drive the probability of a project fail up.
Hasty project starts are costly at exponential rates!
Many of the design and implementation decisions have immediate and often far-reaching effects on how smooth or cumbersome it will become to operate the data warehouse.
Data warehouse team members will and should be focused on other issues such as data base administration, the business model behind all the data, financial aspects of the projects, SQL development, load job scheduling, and many more.
The Performance Specialist looks ahead and in between to identify bottlenecks and Tsunamis before they hit you.
Sunk cost and wrong turn prevention are worth a dollar on the dime!
A project member with such an explicit focus can co-determine where the entire project will be covered one year from now: On page 1 of the corporation’s success story magazine or at the project funeral service announcement pages.
It is of vital importance both to you as well as to your clients that there is a common understanding of the strategic dimension of a performance tuning endeavor before any debate over measurements and results takes place.
There are always the following strategic goals of Performance Optimization:
Where and how this improvement is achieved is a matter of the problems at hand and the constraints under which one operates, but there is no meaningful claim of an improvement without any demonstration that a quantifiable, intentional change took place.
Performance work is goal oriented. Therefore, a finite set of measures can only be claimed a performance success if it solved a problem that was there in the first place.
An improvement can only be called such if its effects are not short-lived or a mere shift of trouble from one end of the Data Warehouse to the next and if they last irrespective of how operates the Data Warehouse at the moment.
Teradata is a very complex system. You will often be tempted to change several parameters at the same time to speed up the tuning process. Don’t do it. You might solve the problem but may not know which change fixed the problem. If the problem shows up again, you will have to start over again with your investigations.
Keeping plans aims in the same direction as point 1. Most times, especially when tuning SQL statements, it is the execution plan which changes together with your tuning activities. It’s not satisfying to improve the performance of a query, without knowing what caused the improvement, such as:
Another essential task is to keep a complete documentation of the changes you did. Probably, these changes have to make it somehow into your production system. Handing over a full documentation of changes to the DBA ensures that all your changes will be applied.
Workarounds most times are faster to implement but at the end, having a good PDM pays off. As a general rule, fix the problems as early as possible in your ETL/ELT chain, for example:
Many aspects of tuning can be tested on any system, but certain details only can be tested on the target system. The target system may surprise you with different execution plans and bottlenecks, such as being IO or CPU bound.
Absolute measures of CPU time and IOs have to be your primary criteria when evaluating your improvements. Execution times will depend on the overall system workload and many parameters which you can’t improve.
Often, you are assuming bad performance being related to Teradata, but it’s not. Slow running reports could be caused by a slow network connection between Teradata and the reporting server. You have to investigate all related systems, such as ETL-Servers (Datastage, ODI, Informatica, Ab Initio), Reporting Servers, Unix Load Nodes, etc.
To help each other, I would like to create a global database containing the characteristics of Teradata systems all over the world, allowing each of us to compare the own system with others, as this is helpful in analyzing performance issues etc.
I am starting with a simple report, which only shows, for each hour of the day, the CPU Idle time and Disk IO Waits.
If you would like to help us in building such a database, find below is the SQL statement which has to be executed on your systems:
The query should run within a couple of seconds and will return 24 records (one record per hour).
One hint: Usually, the content of the table DBC.ResUsageSpma is removed regularly and stored in a backup table. I assume you know best how this is implemented on your system and where to find it.
If possible, please add information about the kind of system, number of nodes and AMPs.
Please send your results to [email protected] (plain text or Excel Sheet).
I will collect all results and make them available online (charts & tables). Here is our current collection:
Please let me know as well, which other measures would be interesting from your point of view. I imagine that such a database of measures can be helpful for each of us.
Thanks and Best Regards,