A Teradata HASHROW Table Difference Screening Test

Have you ever experienced extended waiting times for a table comparison to yield results? Have you ever been compelled to halt and defer quality checks on sizable tables owing to excessive resource utilization during a previous attempt?

What if you possessed a straightforward screening test indicating which tables require further examination from a huge array and which are acceptable and can be excluded from the comprehensive analysis? What if this test cannot produce false negatives, meaning no table requiring inspection will be overlooked?

Such a test exists and consists of nothing more than using the Teradata HASHROW function!

Imagine the following situation:

A copy of a large table has been created. A little later, you wish to check whether a new quality maintenance job has led to corrections to the table content. No rows were added or removed, so a simple row count is not helpful. Assume there is no easy way around a full table content comparison between the before and after data. Imagine further that this is what you have to perform over hundreds of tables.

The direct approach is to compare tables THE_COPY and THE_ORIGINAL against each other with plain SQL.

SELECT * FROM DWH.THE_COPY
MINUS
SELECT * FROM DWH.THE_ORIGINAL
;

Alternatively, you can calculate the HASHOW()-value for every row and use this unique value list against each other:

Alternatively, you can calculate the

SELECT HASHROW( colum1, column2, …, columnN) FROM DWH. THE_COPY
MINUS
SELECT HASHROW( colum1, column2, …, columnN) FROM DWH. THE_ORIGINAL
;

Whereas the direct approach shows you deviating rows in full and more convenient for small tables or smaller row subsets, it can become costly or even prohibitively bulky for large tables. The advantages of the ROWHASH() approach are that we only operate with one value per row instead of combining all attributes together. Second, before comparing, we avoid uncompressing and recopying the actual table content across AMPs.

We tested our hypothesis using a table and a 120 GB file, which had already been partially compressed. By defining Query Bands around each, we measured the resource consumption of each method of comparison and noted the following variances:

METHODTotalIOCountAMPCPUTimeMaxCPUAmpNrMinAmpIOMaxAmpIOMaxIOAmpNrSpoolUsage
PLAIN SQL201.755.71670.64182.238.6162.247.648101.786.095.099.904
HASHROW17.184.33333.24919190.144192.24913115.833.006.592

As can be seen clearly, the HASHOW() approach used only a fraction of the PLAIN SQL equivalent resources. Note that we performed this test under ideal conditions of no other parallel activities and with a technical user mighty enough to cope with such a large spool space. We waited over two hours for the first version to finish while we had the second one in half the time.

As can be seen clearly, the HASHOW() approach

In summary, the HASHROW() method we introduced offers a concise initial distinction between unaltered tables and those impacted by the modification. Although hash collisions may occur, resulting in the same value for distinct value combinations, this is a rare possibility.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

📊 Data Platform Migration Survey

Help us map where the industry is heading. Results are public — see what others chose.

1. What is your current data platform?

2. Where are you migrating to (or evaluating)?

Migrating FROM
Migrating TO

Thanks for voting! Share this with your network.

Follow me on LinkedIn for daily insights on data warehousing and platform migrations.

Stay Ahead in Data Warehousing

Get expert insights on Teradata, Snowflake, BigQuery, Databricks, Microsoft Fabric, and modern data architecture — delivered to your inbox.

2 thoughts on “A Teradata HASHROW Table Difference Screening Test”

  1. One flaw, HashRow ignores order, so values A,B,C have the same hash as C,B,A, so inverted data discrepancies will be missed.

    Reply

Leave a Comment

DWHPro

Expert network for enterprise data platforms. Senior consultants, project teams built for your challenge — across Teradata, Snowflake, Databricks, and more.

📍Vienna, Austria & Jacksonville, Florida

Quick Links
Services Team Teradata Book Blog Contact Us
Connect
LinkedIn → [email protected]
Newsletter

Join 4,000+ data professionals.
Weekly insights on Teradata, Snowflake & data architecture.