Archive

Category Archives for "General"

Teradata TPump

Teradata Quiz
Teradata Quiz
Developer: Roland Wenzlofsky
Price: Free

What is Teradata TPump?

TPump stands for Teradata Parallel Data Pump.

Teradata TPump does not move data in large blocks like Fastload or Multiload. Instead, it loads one row at a time, using row hash locks. Using row hash locks allows TPump to do many concurrent INSERTs and UPDATEs on a table.

Teradata TPump is not designed to bulk load huge amounts of data at once (and as fast as possible) but allows a trickle-feed of data to flow into the database.

The Teradata TPump Features

Real-time loading: Transactional systems need to get the data as soon as possible into the data warehouse. TPump allows performing near real-time updates from source systems into Teradata.

Throttling: When loading with TPump you can define how many updates should be done per minute. This is called the statement rate.

The statement rate can be even changed during the execution of a TPump load job. You might decide to speed up the load during the nightly batch window and slow it down (decreasing resource consumption) during the day when business users are running their reports.

DML Functionality: TPump can do DML functions (like Multiload). This includes INSERT, UPDATE and DELETE statements.

Advantages over Bulk Loading with Fastload or Multiload: TPump allows both USI and NUSI. While Fastload requires that the target table is empty, TPump can load, such as the Multiload, into a populated table.

Unlike Fastload and Multiload, row duplicates are allowed. Furthermore, you don’t have to drop triggers or referential integrity before loading.

The most important TPump Limitations

  • No concatenation of input data files is allowed.
  • TPump can’t handle aggregates, arithmetic functions or exponentiation.
  • The use of the SELECT is not allowed.
  • No more than four IMPORT commands may be used in a single load task. This means that a most, four files can be directly read in a single run.
  • TPump performance will decrease if access logging is enabled on the target objects.

Monitoring TPump

Teradata TPump comes with a tool called the TPump Monitor. This tool allows you to check the status of TPump jobs in real-time and to change the statement rate of running jobs.

You can start the monitor under LINUX with the following command:

tpumpmon [-h] [TDPID/],[,]

TPump Error Handling

TPump uses one error table per target table. The error tables stores a copy of each row which caused the error.

These are some common errors showing up in TPump loads:

2816: Failed to insert a duplicate row into TPump Target Table.

The error happens when TPump discovered a duplicate row. Only the first row will be inserted, duplicates will be discarded.

2817: Activity count greater than one for TPump UPDATE/DELETE.

This error happens when there are extra rows when TPump is attempting an UPDATE or DELETE.

2818: Activity count zero for TPump UPDATE or DELETE.

This error shows that the UPDATE or DELETE did not occur.

TPump Restartability

TPump is fully restartable as long as the log table and error tables are not dropped.

TPump Example Script

.logtable TheDatabase.tpumplog;
.logon DWHPRO,******;
database TheDatabase;
.name test;
.begin load errortable ET_CUST
sleep 5
checkpoint 5
sessions 8
errlimit 4
pack 4

tenacity 4
serialize on;

specify a PRIMAY KEY in the .FILED command */
.layout mylayout;
.field CUSTID * varchar(5) key;
.field CUSTNAME * varchar(30);
.dml label INST;
insert into CUSTOMER
(CUSTID,CUSTOMER_NAME
)
values
(:CUSTID,
:CUSTNAME
);
.import infile CUSTDATA.txt
format vartext ‘,’
layout mylayout
apply INST;
.end load;
.logoff;

Teradata Quiz
Teradata Quiz
Developer: Roland Wenzlofsky
Price: Free

The Fundamentals of FastLoading on Teradata

1

A Short Teradata Quiz for You

Here is a short Teradata Quiz for you. I hope you like it.

 

 

Welcome to the Teradata Quiz.

Each time you take the quiz, a different set of questions will be asked. We are adding new questions on a regular base. At the end of the quiz, we will show you the correct answers, your score and the average score of all people.

Enter your name and press "NEXT" to start the quiz

Name Email
1. Which statements about FALLBACK protection are true?
2. Which partition holds the NULL values in a row partitioned table?
3. When utilizing a Unique Secondary index, how many AMP (s) are usually accessed?
4. What can be the disadvantages of SET tables?
5. Locks can be placed at which three levels?
6. Is it possible that different column values are mapped to the same row hash value?
7. How many presence bytes are always available in each row?
8. In which ways is data access handled by Teradata?
9. What option can best describe Teradata's architecture?
10. Would you consider to use block level compression (BLC) on a CPU bound system?

Be sure to click "Submit Quiz" to see your results
(click only once, the rating process may need a couple of seconds)!



Engaging a Teradata Tuning Specialist is not a Luxury

When do you need a fire brigade most? When your house is on fire!

When is the best time to establish a fire brigade?

No, not when your house is on fire, but before the city the house is located in, is founded.

Finally, when should there be a fire safety inspection?

Before and while you construct and dwell the building, not after the fact, when entire wings have to be teared down and reconstructed.

The same is true for performance specialists and Data Warehouses.

The Teradata Tuning Specialist

Performance Specialists are a good fit for your Data Warehouse team at the inception of a data warehouse project and at any later stage.

This is not lobbying, but an insight from years of experience in Data Warehousing.

When you think that engaging such a specialist is too costly before you can justify his or her presence with pressing real-life problems, you are essentially saying that you are willing to drive the probability of a project fail up.

Hasty project starts are costly at exponential rates!

Many of the design and implementation decisions have immediate and often far-reaching effects on how smooth or cumbersome it will become to operate the data warehouse.

Data warehouse team members will and should be focused on other issues such as data base administration, the business model behind all the data, financial aspects of the projects, SQL development, load job scheduling, and many more.

The Performance Specialist looks ahead and in between to identify bottlenecks and Tsunamis before they hit you.

Sunk cost and wrong turn prevention are worth a dollar on the dime!

A project member with such an explicit focus can co-determine where the entire project will be covered one year from now: On page 1 of the corporation’s success story magazine or at the project funeral service announcement pages.

It is of vital importance both to you as well as to your clients that there is a common understanding of the strategic dimension of a performance tuning endeavor before any debate over measurements and results takes place.

There are always the following strategic goals of Performance Optimization:

  1. A net improvement of system resource usage is achieved
  2. Key stakeholders of the Data Warehouse can achieve their goals at all, or again, or faster, or more regularly, and with less friction and intervention
  3. Changes and improvements are sustainable in terms of time, staff, and irrespective of the presence of Performance Specialists

Where and how this improvement is achieved is a matter of the problems at hand and the constraints under which one operates, but there is no meaningful claim of an improvement without any demonstration that a quantifiable, intentional change took place.

Performance work is goal oriented. Therefore, a finite set of measures can only be claimed a performance success if it solved a problem that was there in the first place.

An improvement can only be called such if its effects are not short-lived or a mere shift of trouble from one end of the Data Warehouse to the next and if they last irrespective of how operates the Data Warehouse at the moment.

7 Essential General Teradata Tuning Tips

Some General Teradata Tuning Ideas

1. One change at a Time

Teradata is a very complex system.  You will often be tempted to change several parameters at the same time to speed up the tuning process. Don’t do it. You might solve the problem but may not know which change fixed the problem. If the problem shows up again, you will have to start over again with your investigations.

2. Keep before and after Explain Plans

Keeping plans aims in the same direction as point 1. Most times, especially when tuning SQL statements, it is the execution plan which changes together with your tuning activities. It’s not satisfying to improve the performance of a query, without knowing what caused the improvement, such as:

  • Is the Optimizer choosing a more suitable join type?
  • Does the Optimizer have better statistics available?
  • Does our change allow the Optimizer to use an alternative access path (secondary index, join index, etc.)

3. Keep a Documentation of all Changes

Another essential task is to keep a complete documentation of the changes you did. Probably, these changes have to make it somehow into your production system. Handing over a full documentation of changes to the DBA ensures that all your changes will be applied.

4. Solve the Physical Data Model, avoid Workarounds

Workarounds most times are faster to implement but at the end, having a good PDM pays off. As a general rule, fix the problems as early as possible in your ETL/ELT chain, for example:

  • Take care of suitable data types in your PDM, avoid fixing problems in late stages like data marts, reports, etc.
  • Avoid adding logic to a view layer, only because you forgot to integrate it into the transform process
  • Don’t confuse your data warehouse with an operational data store (ODS); you will force the report designers to repeat business logic over and over again into the reports.

5. Final test your Changes on the Target System

Many aspects of tuning can be tested on any system, but certain details only can be tested on the target system. The target system may surprise you with different execution plans and bottlenecks, such as being IO or CPU bound.

6. Never use Execution Times alone to estimate your Improvements

Absolute measures of CPU time and IOs have to be your primary criteria when evaluating your improvements. Execution times will depend on the overall system workload and many parameters which you can’t improve.

7. Don’t miss the Wood for the Trees

Often, you are assuming bad performance being related to Teradata, but it’s not. Slow running reports could be caused by a slow network connection between Teradata and the reporting server. You have to investigate all related systems, such as ETL-Servers (Datastage, ODI, Informatica, Ab Initio), Reporting Servers, Unix Load Nodes, etc.

2

Let’s Solve Teradata Problems Together!

Teradata Administrators all over the world are fighting with the same kind of problems.

To help each other, I would like to create a global database containing the characteristics of Teradata systems all over the world, allowing each of us to compare the own system with others, as this is helpful in analyzing performance issues etc.

I am starting with a simple report, which only shows, for each hour of the day,  the CPU Idle time and Disk IO Waits.

If you would like to help us in building such a database, find below is the SQL statement which has to be executed on your systems:

SELECT
EXTRACT(HOUR FROM TheTime) time_of_day,
((sum(CPUIoWait)) / (sum(CPUIoWait+CPUUExec+CPUUServ+CPUIdle)))*100 as CPU_WAITING,
((sum(CPUIdle)) / (sum(CPUIoWait+CPUUExec+CPUUServ+CPUIdle)) ) *100 as CPU_IDLE
FROM
DBC.ResUsageSpma
WHERE TheDate between DATE-30 AND DATE-1 GROUP BY time_of_day ORDER BY 1;

The query should run within a couple of seconds and will return 24 records (one record per hour).

One hint: Usually, the content of the table DBC.ResUsageSpma is removed regularly and stored in a backup table. I assume you know best how this is implemented on your system and where to find it.

If possible, please add information about the kind of system, number of nodes and AMPs.

Please send your results to [email protected] (plain text or Excel Sheet).

I will collect all results and make them available online (charts & tables). Here is our current collection:

 

Please let me know as well, which other measures would be interesting from your point of view. I imagine that such a database of measures can be helpful for each of us.

Thanks and Best Regards,

DWH Pro

 

>