10

I have a simple table

CREATE TABLE test (
  col INT,
  data TEXT,
  KEY (col)
);

and a simple transaction

START TRANSACTION;

SELECT * FROM test WHERE col = 4 FOR UPDATE;

-- If no results, generate data and insert
INSERT INTO test SET col = 4, data = 'data';

COMMIT;

I am trying to ensure that two copies of this transaction running concurrently result in no duplicate rows and no deadlocks. I also don't want to incur the cost of generating data for col = 4 more than once.

I have tried:

  1. SELECT .. (without FOR UPDATE or LOCK IN SHARE MODE):

    Both transactions see that there are no rows with col = 4 (without acquiring a lock) and both generate data and insert two copies of the row with col = 4.

  2. SELECT .. LOCK IN SHARE MODE

    Both transactions acquire a shared lock on col = 4, generate data and attempt to insert a row with col = 4. Both transactions wait for the other to release their shared lock so it can INSERT, resulting in ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction.

  3. SELECT .. FOR UPDATE

    I would expect that one transaction's SELECT will succeed and acquire an exclusive lock on col = 4 and the other transaction's SELECT will block waiting for the first.

    Instead, both SELECT .. FOR UPDATE queries succeed and the transactions proceed to deadlock just like with SELECT .. LOCK IN SHARE MODE. The exclusive lock on col = 4 just doesn't seem to work.

How can I write this transaction without causing duplicate rows and without deadlock?

Jesse
  • 6,725
  • 5
  • 40
  • 45
  • Don't do this. Do set a `UNIQUE` index and use `INSERT ... ON DUPLICATE KEY` or `INSERT IGNORE` instead. – tadman Mar 02 '17 at 05:40
  • @tadman If I do `INSERT .. ON DUPLICATE KEY UPDATE` then both transactions will incur the cost of generating `data`, unnecessarily. If I do `INSERT IGNORE ..` then *all* errors MySQL encounters while running the `INSERT` will be ignore (not just duplicate keys), which is pretty sloppy. – Jesse Mar 02 '17 at 05:45
  • What is the value of `tx_isolation`? – Rick James Mar 04 '17 at 00:32
  • @RickJames `REPEATABLE READ` (`SERIALIZABLE` would just make the `SELECT ..` version behave the same as `SELECT .. LOCK IN SHARE MODE`) – Jesse Mar 04 '17 at 09:58
  • Good question. Never thought of it. – shehanpathi Jun 07 '22 at 15:28

6 Answers6

0

Adjust your schema slightly:

CREATE TABLE test (
  col INT NOT NULL PRIMARY KEY,
  data TEXT
);

With col being a primary key it cannot be duplicated.

Then use the ON DUPLICATE KEY feature:

INSERT INTO test (col, data) VALUES (4, ...)
  ON DUPLICATE KEY UPDATE data=VALUES(data)
tadman
  • 208,517
  • 23
  • 234
  • 262
  • If I do `INSERT .. ON DUPLICATE KEY UPDATE` then both transactions will incur the cost of generating `data`, unnecessarily. I only want one of them to proceed to generate `data`. – Jesse Mar 02 '17 at 05:47
  • 2
    Your example given in the question does no generating. It has a simple string. If you want to avoid duplicate generation, find out which IDs you need to generate data for and use some kind of queue to schedule those operations. There are a multitude of job-queue systems out there to pick from. Holding open a lock for extended periods of time is *begging* for deadlocks. – tadman Mar 02 '17 at 06:01
0

Maybe this...

START TRANSACTION;
INSERT IGNORE INTO test (col, data) VALUES (4, NULL);  -- or ''
-- if Rows_affected() == 0, generate data and replace `data`
    UPDATE test SET data = 'data' WHERE col = 4;
COMMIT;

Caution: If the PRIMARY KEY is an AUTO_INCREMENT, this may 'burn' an id.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • `INSERT IGNORE` ignores all errors and warnings, not just duplicate key errors, so it throws away my ability to detect and report other kinds of errors and warnings that might occur during that query. – Jesse Mar 17 '17 at 03:23
0

Note that InnoDB has 2 types of exclusive locks: one is for update and delete, and another one for insert. So to execute your SELECT FOR UPDATE transaction InnoDB will have to first take the lock for update in one transaction, then the second transaction will try to take the same lock and will block waiting for the first transaction (it couldn't have succeeded as you claimed in the question), then when first transaction will try to execute INSERT it will have to change its lock from the lock for update to the lock for insert. The only way InnoDB can do that is first downgrade the lock down to shared one and then upgrade it back to lock for insert. And it can't downgrade the lock when there's another transaction waiting to acquire the exclusive lock as well. That's why in this situation you get a deadlock error.

The only way for you to correctly execute this is to have unique index on col, try to INSERT the row with col = 4 (you can put dummy data if you don't want to generate it before the INSERT), then in case of duplicate key error rollback, and in case INSERT was successful you can UPDATE the row with the correct data. Note though that if you don't want to incur cost of generating data unnecessarily it probably means that generating it takes a long time, and all that time you'll hold an open transaction that inserted row with col = 4 which will hold all other processes trying to insert the same row hanging. I'm not sure that would be significantly better than generating data first and then inserting it.

0

If you're goal is to have only one session insert the missing row, and any other sessions do nothing without even attempting an insert of DATA, then you need to either lock the entire table (which reduces your concurrency) or insert an incomplete row and follow it with an update.

A. create a primary key on column COL

Code:

begin
  insert into test values (4,null);
  update test set data = ... where col = 4;
  commit;
exception
  when dup_val_on_index then
    null;
end;

The first session that attempts the insert on col 4 will succeed and procede to the update where you can do the expensive calculation of DATA. Any other session trying to do this will raise a PK violation (-00001, or DUP_VAL_ON_INDEX) and go to the exception handler which traps it and does nothing (NULL). It will never reach the update statement, so won't do whatever expensive thing it is you do to calculate DATA.

Now, this will cause the other session to wait while the first session calculates DATA and does the update. If you don't want that wait, you can use NOWAIT to cause the lagging sessions to throw an exception immediately if the row is locked. If the row doesn't exist, that will also throw an exception, but a different one. Not great to use exception handling for normal code branches, but hey, it should work.

declare
  var_junk number;
begin
  begin
    select col into var_junk from test where col = 4 for update nowait;
  exception
    when no_data_found then
      insert into test values (col,null);
      update test set data = ... where col = 4;
      commit;
    when others then
      null;
  end;
end;
  
Paul W
  • 5,507
  • 2
  • 2
  • 13
0

use for update then :

  • change isolation to READ COMMITED, can avoid gap lock
  • or change isolation to SERIALIZABLE
vishun
  • 79
  • 6
0

here is the query that is right now used and working perfectly even at peak load,

INSERT INTO users (users_email, users_name) SELECT * FROM (SELECT "email@domainname.com" as abc, "user" as abd) AS tmp WHERE NOT EXISTS ( SELECT users_id FROM users WHERE users_email = "email@domainname.com" ) LIMIT 1;