4

When doing an INSERT INTO {tableA} SELECT a,b,c FROM {tableB} ON DUPLICATE KEY UPDATE x=y

What is the precedence on how the duplicate keys are evaluated? I assume that MySQL first checks to see if a tuple from tableB exists clashes with a unique/primary key in tableA. If the SELECT doesn't include a primary key, or if no other duplicate key exists, then each subsequent UNIQUE INDEX "group" is evaluated and the same checking is performed. But what happens if your tableB has multiple sets of unique, multi-column indexes? Are they evaluated top-to-bottom, as described by SHOW INDEXES FROM tableB ?

Here's my SHOW INDEXES FROM <table>:

Table,Non_unique,Key_name,Seq_in_index,Column_name,Collation
daily_metrics,0,PRIMARY,1,id,A
daily_metrics,0,unique_lineItem_creative_y_m_d,1,line_item_id,A
daily_metrics,0,unique_lineItem_creative_y_m_d,2,creative_id,A
daily_metrics,0,unique_lineItem_creative_y_m_d,3,year,A
daily_metrics,0,unique_lineItem_creative_y_m_d,4,month,A
...

Imagine there are additional sets of unique indexes similar to unique_lineItem_creative_y_m_d

The docs don't seem to illustrate this behavior. https://dev.mysql.com/doc/refman/8.0/en/insert-on-duplicate.html

I also assume that the first matching unique index is used, if a match exists, without any attempt to match subsequent unique indexes that could match. In other words, the first unique index that matches is used, without regard for trying to find the best possible match across all indexes.

Jesse Skrivseth
  • 481
  • 2
  • 13
  • I general, the sort order of a select is undefined, except when you specify an order by clause. The existence of indexes can affect that order, but there is no guarantee. I'm not even sure the order of inserting/updating is guaranteed, and I can imagine it is not, to possibly support parallel inserts/updates. Anyway, if you want to use specific values, it's better to write the select in such a way that it only returns those values, and don't try to guess if the incorrect values will be overwritten before the end of the statement. – GolezTrol Mar 26 '19 at 17:09

2 Answers2

2

You are correct: as soon as MySQL detects a duplicate in any UNIQUE index, it abandons the INSERT and does the update.

The order in which MySQL evaluates the unique indexes does not change the outcome. There's no such thing as a better match for some unique index over another. Because they're unique indexes, any combination of column values that shows up as a duplicate is enough to completely specify the row to be updated.

MySQL's query planner, hopefully, chooses the index that's least costly to evaluate. But, formally speaking, the index it uses for this purpose is unpredictable. This unpredictability is an attribute of declarative languages like SQL. MySQL can do its work any way that works, and it doesn't have to tell you. It can be hard for programmers to grasp, because we're used to procedural languages.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
1

If any primary or unique keys exist in tableB, that's irrelevant. The only thing that matters for INSERT...ON DUPLICATE KEY UPDATE are the primary or unique keys of the table you're inserting into — tableA in your example.

If the values you insert to tableA are already found in any of tableA's primary or unique keys, that triggers the UPDATE part of the IODKU.

It's about the values being inserted, not the constraints of the source table.

You can also trigger the UPDATE without using any source table -- just by inserting a VALUES() clause with a set of constants.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Yes, my question is in what order is the set of tableA's primary or unique keys evaluated when attempting to insert. – Jesse Skrivseth Mar 26 '19 at 17:08
  • I'm not sure why it matters. – Bill Karwin Mar 26 '19 at 17:17
  • 1
    There is an [example here](https://mariadb.com/kb/en/library/insert-on-duplicate-key-update/) which shows that the order in which unique indexes are searched matters. If one insert violates one index on row A and violates a second index on row B, then it is unclear if row A or row B should be updated. – unutbu Mar 26 '19 at 17:39
  • @unutbu This illustrates part of my concern. It seems that the "first" match in tableA is updated - probably "first" in natural or index ordering - leaving any other possible matching rows as-is. We can control the order of our SELECT, but we can't control the search order used by the INSERT/UPDATE when searching for a matching row. Main takeaway from that doc: "If more than one unique index is matched, only the first is updated. It is not recommended to use this statement on tables with more than one unique index." – Jesse Skrivseth Mar 26 '19 at 18:10