fbpx

There are several performance benefits if you use MERGE INTO requests instead of UPDATE statements:

  • No spool required
  • Fewer CPU seconds consumed
  • Less Disk IOs needed

IO reduction arises from the fact that each data block is only touched once and that no spooling is required during the merge operation.

The advantage of the MERGE INTO statement over the traditional UPDATE statement is becoming even bigger if secondary indexes are available, as index maintenance for the MERGE INTO statement is done for whole data blocks, not per row.

As usual, I will demonstrate this with a small example, based on the following tables:

CREATE MULTISET TABLE DWHPro.Sales4 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
SalesId INTEGER NOT NULL,
Quantity DECIMAL(18,2),
SalesDate DATE FORMAT ‘YYYY-MM-DD’)
PRIMARY INDEX ( SalesId )
PARTITION BY RANGE_N(SalesDate BETWEEN DATE ‘1900-01-01’ AND DATE ‘2016-12-31’ EACH INTERVAL ‘1’ MONTH ,NO RANGE);

CREATE MULTISET TABLE DWHPro.Sales5 ,NO FALLBACK ,
NO BEFORE JOURNAL,
NO AFTER JOURNAL,
CHECKSUM = DEFAULT,
DEFAULT MERGEBLOCKRATIO
(
SalesId INTEGER NOT NULL,
Quantity DECIMAL(18,2),
SalesDate DATE FORMAT ‘YYYY-MM-DD’)
PRIMARY INDEX ( SalesId )
PARTITION BY RANGE_N(SalesDate BETWEEN DATE ‘1900-01-01’ AND DATE ‘2016-12-31’ EACH INTERVAL ‘1’ MONTH ,NO RANGE);

Both tables have the same primary index and partitions and contain about 3.5 million rows.

I have been executing two statements, exactly doing the same update, but the first statement is a traditional UPDATE and the second one the MERGE INTO update:

UPDATE Sales4 FROM Sales5 SET Quantity = Sales5.Quantity WHERE  Sales4.SalesID=Sales5.SalesID AND Sales4.SalesDate=Sales5.SalesDate;
1) First, we lock a distinct DWHPRO.”pseudo table” for read on a
RowHash to prevent global deadlock for DWHPRO.Sales5.
2) Next, we lock a distinct DWHPRO.”pseudo table” for write on a
RowHash to prevent global deadlock for DWHPRO.Sales4.
3) We lock DWHPRO.Sales5 for read, and we lock DWHPRO.Sales4 for
write.
4) We do an all-AMPs JOIN step from DWHPRO.Sales5 by way of a RowHash
match scan with no residual conditions, which is joined to
DWHPRO.Sales4 by way of a RowHash match scan with no residual
conditions. DWHPRO.Sales5 and DWHPRO.Sales4 are joined using a
rowkey-based merge join, with a join condition of (
“(DWHPRO.Sales4.SalesId = DWHPRO.Sales5.SalesId) AND
(DWHPRO.Sales4.SalesDate = DWHPRO.Sales5.SalesDate)”). The input
tables DWHPRO.Sales5 and DWHPRO.Sales4 will not be cached in
memory, but DWHPRO.Sales5 is eligible for synchronized scanning.
The result goes into Spool 1 (all_amps), which is built locally on
the AMPs. Then we do a SORT to order Spool 1 by the sort key in
spool field1 (DWHPRO.Sales4.ROWID). The result spool file will
not be cached in memory. The size of Spool 1 is estimated with
low confidence to be 3,461,454 rows (89,997,804 bytes). The
estimated time for this step is 1 minute and 5 seconds.
5) We do a MERGE Update to DWHPRO.Sales4 from Spool 1 (Last Use) via
ROWID. The size is estimated with low confidence to be 3,461,454
rows (100,382,166 bytes). The estimated time for this step is 1
minute and 52 seconds.
-> No rows are returned to the user as the result of statement 1.
The total estimated time is 2 minutes and 57 seconds.

As the EXPLAIN statement reveals, a spool is created and has to be sorted before the MERGE update occurs. Now let’s take a look at the MERGE INTO example request:

MERGE INTO Sales4 USING Sales5 ON Sales4.SalesID=Sales5.SalesId AND Sales4.SalesDate=Sales5.SalesDate
WHEN MATCHED THEN UPDATE SET Quantity=Sales5.Quantity;
1) First, we lock a distinct DWHPRO.”pseudo table” for read on a
RowHash to prevent global deadlock for DWHPRO.Sales5.
2) Next, we lock a distinct DWHPRO.”pseudo table” for write on a
RowHash to prevent global deadlock for DWHPRO.Sales4.
3) We lock DWHPRO.Sales5 for read, and we lock DWHPRO.Sales4 for
write.
4) We do an all-AMPs merge with matched updates into DWHPRO.Sales4
from DWHPRO.Sales5 with a condition of (“(DWHPRO.Sales4.SalesId =
DWHPRO.Sales5.SalesId) AND (DWHPRO.Sales4.SalesDate =
DWHPRO.Sales5.SalesDate)”). The number of rows merged is
estimated with no confidence to be 3,461,454 rows.
-> No rows are returned to the user as the result of statement 1.

Now let’s compare the resource usage of both statements:

Logical IOsCPU SecondsSpool (GB)
UPDATE5991,005,973
MERGE INTO4441,003,570

Obviously, the MERGE INTO consumes about 26% fewer IOs and 40% fewer CPU seconds than the UPDATE statement.

We mentioned earlier that index maintenance is done in the case of MERGE INTO for whole data blocks. Therefore, I will show you another example, which carves out the impact of a NUSI being maintained at the same time:

CREATE INDEX (Quantity) ON Sales4;

Everything else stays the same. Here is the excerpt from the Query Log:

Logical IOsCPU SecondsSpool (GB)
UPDATE12439,0011,103
MERGE INTO4459,004,460

The above table shows that the index maintenance was almost for free with the MERGE INTO approach, but IOs for the traditional UPDATE statement suffered when maintaining the NUSI. IOs for the MERGE INTO statement is almost 64% less than for the UPDATE statement. The NUSI did not impact spool usage.

Similar to our example above, MERGE INTO can be used to replace the INSERT…SELECT or to apply UPDATE and INSERT steps at once (even further increasing performance).

Conclusion:

While MERGE INTO may not be the right choice  (or even not possible) in every single situation, you should consider it as an alternative each time when you are dealing with INSERTs and  UPDATEs. It can drastically improve performance!

__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"var(--tcb-skin-color-0)"}},"gradients":[]},"original":{"colors":{"62516":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Previous Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"b4fbe":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"b4fbe":{"val":"rgb(241, 99, 52)"}},"gradients":[]},"original":{"colors":{"b4fbe":{"val":"rgb(19, 114, 211)","hsl":{"h":210,"s":0.83,"l":0.45}}},"gradients":[]}}]}__CONFIG_colors_palette__
Next Article
__CONFIG_colors_palette__{"active_palette":0,"config":{"colors":{"62516":{"name":"Main Accent","parent":-1}},"gradients":[]},"palettes":[{"name":"Default Palette","value":{"colors":{"62516":{"val":"rgb(255, 0, 0)"}},"gradients":[]}}]}__CONFIG_colors_palette__
GET OUR TERADATA BOOK

Roland Wenzlofsky

Roland Wenzlofsky is an experienced freelance Teradata Consultant & Performance Trainer. Born in Austria's capital Vienna, he is building and tuning some of the largest Teradata Data Warehouses in the European financial and telecommunication sectors for more than 20 years. He has played all the roles of developer, designer, business analyst, and project manager. Therefore, he knows like no other the problems and obstacles that make many data warehouse projects fail and all the tricks and tips that will help you succeed.

  • Thanks for this. Can we use a merge into using values rather than a subquery as the source table?

  • {"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}

    You might also like

    >