Teradata MERGE INTO Vs. UPDATE: Performance Comparison And Limitations

Teradata MERGE INTO vs. UPDATE

This article compares the UPDATE statement to the MERGE INTO statement, analyzing their respective performance differences and limitations.

The Teradata MERGE INTO statement positively impacts performance by reducing I/O operations through the following properties.

MERGE INTO does not require spool space
MERGE INTO typically consumes fewer CPU seconds
MERGE INTO typically consumes fewer IOs

MERGE INTO offers an advantage in lower IOs as Teradata processes each data block only once without building up any spool. This benefit is particularly significant when secondary indexes are present on the table, as they are also processed block by block.

We designed a test scenario comprising two tables with around 3.5 million rows each to showcase the performance advantages. These tables share identical primary indexes and partitioning.

CREATE MULTISET TABLE DWHPro.Sales4 
(
  SalesId INTEGER NOT NULL,
  Quantity DECIMAL(18,2),
  SalesDate DATE FORMAT 'YYYY-MM-DD')
  PRIMARY INDEX ( SalesId ) PARTITION BY 
  RANGE_N(SalesDate BETWEEN DATE '1900-01-01' AND DATE '2022-12-31' EACH INTERVAL '1' MONTH ,NO RANGE);

CREATE MULTISET TABLE DWHPro.Sales5 
(
  SalesId INTEGER NOT NULL,
  Quantity DECIMAL(18,2),
  SalesDate DATE FORMAT 'YYYY-MM-DD'
) PRIMARY INDEX ( SalesId )PARTITION BY RANGE_N(SalesDate BETWEEN DATE '1900-01-01' AND DATE '2022-12-31' EACH INTERVAL '1' MONTH ,NO RANGE);

I ran the same statement twice — once using UPDATE and once using MERGE INTO.

UPDATE Sales4 FROM Sales5 SET Quantity = Sales5.Quantity
WHERE 
   Sales4.SalesID=Sales5.SalesID AND  
   Sales4.SalesDate=Sales5.SalesDate;

Firstly, let’s examine the UPDATE statement’s execution plan.

4) We do an all-AMPs JOIN step from DWHPRO.Sales5 by way of a RowHashmatch scan with no residual conditions, which is joined toDWHPRO.Sales4 by way of a RowHash match scan with no residualconditions. DWHPRO.Sales5 and DWHPRO.Sales4 are joined using arowkey-based merge join, with a join condition of ("(DWHPRO.Sales4.SalesId = DWHPRO.Sales5.SalesId) AND(DWHPRO.Sales4.SalesDate = DWHPRO.Sales5.SalesDate)"). The inputtables DWHPRO.Sales5 and DWHPRO.Sales4 will not be cached inmemory, but DWHPRO.Sales5 is eligible for synchronized scanning.The result goes into Spool 1 (all_amps), which is built locally onthe AMPs.Then we do a SORT to order Spool 1 by the sort key inspool field1 (DWHPRO.Sales4.ROWID). The result spool file willnot be cached in memory. The size of Spool 1 is estimated withlow confidence to be 3,461,454 rows (89,997,804 bytes). Theestimated time for this step is 1 minute and 5 seconds.5) We do a MERGE Update to DWHPRO.Sales4 from Spool 1 (Last Use) viaROWID. The size is estimated with low confidence to be 3,461,454rows (100,382,166 bytes). The estimated time for this step is 1minute and 52 seconds.-> No rows are returned to the user as the result of statement 1.The total estimated time is 2 minutes and 57 seconds.

An AMP-local spool is generated and subsequently sorted by ROWID, resulting in a substantial amount of IOs, particularly for extensive tables.

Now, let’s consider the same task accomplished with the MERGE INTO statement:

MERGE INTO Sales4 USING Sales5 
ON  
    Sales4.SalesID=Sales5.SalesId AND    
    Sales4.SalesDate=Sales5.SalesDate
    WHEN MATCHED THEN UPDATE SET Quantity=Sales5.Quantity;

Here is the Execution Plan for the MERGE INTO statement:

1) First, we lock a distinct DWHPRO."pseudo table" for read on aRowHash to prevent global deadlock for DWHPRO.Sales5.2) Next, we lock a distinct DWHPRO."pseudo table" for write on aRowHash to prevent global deadlock for DWHPRO.Sales4.3) We lock DWHPRO.Sales5 for read, and we lock DWHPRO.Sales4 forwrite.4) We do an all-AMPs merge with matched updates into DWHPRO.Sales4from DWHPRO.Sales5 with a condition of ("(DWHPRO.Sales4.SalesId =DWHPRO.Sales5.SalesId) AND (DWHPRO.Sales4.SalesDate =DWHPRO.Sales5.SalesDate)"). The number of rows merged isestimated with no confidence to be 3,461,454 rows.-> No rows are returned to the user as the result of statement 1.

Once we’ve executed both options, we can compare the necessary resources.

	Logical IOs	CPU Seconds	Spool (GB)
UPDATE	5991,00	5,97	3
MERGE INTO	4441,00	3,57	0

MERGE INTO vs. UPDATE

Compared to the UPDATE method, the MERGE INTO statement necessitates fewer IOs and CPU seconds by approximately 26% and 40%, respectively. Hence, choosing MERGE INTO over UPDATE could result in significant resource savings.

As previously noted, indexes on affected tables are maintained block by block with MERGE INTO rather than row by row. A test scenario demonstrates that this results in even greater resource savings with the MERGE INTO statement.

CREATE INDEX (Quantity) ON Sales4; -- Create a NUSI

The only change to the test is the addition of a single NUSI, which significantly shifts the results.

	Logical IOs	CPU Seconds	Spool (GB)
UPDATE	12439,00	11,10	3
MERGE INTO	4459,00	4,46	0

MERGE INTO vs. UPDATE with NUSI

The QueryLog results reveal that the UPDATE statement consumes twice the required IOs and CPU seconds compared to the test that lacked an index. Notably, the NUSI maintenance necessitates minimal resources as the CPU seconds and IOs for the MERGE INTO statement remains nearly unchanged.

Using the MERGE INTO statement results in a significant reduction of 64% in IOs. It’s worth noting that the advantage depends on the tables’ size and the number of changed rows. While adding more indexes to the table can further extend this example, we will leave this to the interested reader. It’s important to consider that the use of MERGE INTO may not always be the optimal choice, as restrictions may prevent its use in place of the UPDATE statement.

Using MERGE INTO improves performance as it executes INSERT and UPDATE statements concurrently, unlike single INSERT and UPDATE statements.

However, there is a notable constraint when using MERGE INTO. This approach exclusively works with partitioned tables if all row partition columns are present in the join condition of both tables. Let’s utilize the two tables from the previous test scenario to aid comprehension.

PRIMARY INDEX ( SalesId ) PARTITION BY RANGE_N(SalesDate BETWEEN DATE '1900-01-01' AND DATE '2022-12-31' EACH INTERVAL '1' MONTH, NO RANGE);

The primary index is SalesId, and the partition column is SalesDate. Attempting a MERGE INTO statement like the one displayed below would result in an error:

MERGE INTO Sales4 USING Sales5  ON     
Sales4.SalesID=Sales5.SalesId AND     
Sales4.SalesDate=Sales5.SalesDate   -- If SalesDate is missing in the join, we will get an error when MATCHED THEN UPDATE SET Quantity=Sales5.Quantity;

If SalesId is a surrogate key and SalesDate is already included when creating the key, including SalesDate in the joins may be redundant without affecting the outcome. However, it is not always advisable as it may compromise the source independence of our data. Blindly switching all UPDATE statements to MERGE INTO statements is not recommended, and we should always evaluate its impact on our data warehouse design.

Related Services

⚡ Need Help Optimizing Your Data Platform?

We cut data platform costs by 30–60% without hardware changes. 25+ years of hands-on tuning experience.

Explore Our Services →

📋 Considering a Move From Teradata?

Get a personalized migration roadmap in 2 minutes. We have migrated billions of rows from Teradata to Snowflake, Databricks, and more.

Free Migration Assessment →

3 thoughts on “Teradata MERGE INTO vs. UPDATE: Performance Comparison and Limitations”

Sravan

03/24/2023 at 10:02 am

Case1:
USING
SPV1 (DATE)
MERGE INTO DWHPRO.TEST1 AS Test_Feed
USING
(SELECT a1, a2, a3 from DWHPRO.TEST2
WHERE test1.a1 = test2.a1
)
…
…
WHEN MATCHED THEN UPDATE SET DATE = “Current_Date”
…..
WHEN NOT MATCHED THEN INSERT …. …
;

Case2: Delete from DWHPRO.TEST1 … WHERE A1, A2, A3 IN
(SEL A1, A2 , A3 FROM DWHPRO.TEST2 FROM DWHPRO.TEST2);
Insert into DWHPRO.Test1…. ( );

@Roland, In the above case (If Tables are Non partitioned),
Wouldn’t Deleting all the matching records and then INSERT the new records will be more beneficial?
I see, most of the Developers prefer writing MERGE statements. Can you please share your thoughts?
FS

05/26/2016 at 7:15 pm

Thanks for this. Can we use a merge into using values rather than a subquery as the source table?
- christianasteves
  
  11/02/2016 at 9:33 am
  
  Thanks for the information, it is very helpful.

Teradata MERGE INTO vs. UPDATE: Performance Comparison and Limitations

Teradata MERGE INTO vs. UPDATE

📊 Data Platform Migration Survey

Stay Ahead in Data Warehousing

3 thoughts on “Teradata MERGE INTO vs. UPDATE: Performance Comparison and Limitations”

Leave a Comment Cancel reply