8

When using SERIALIZABLE transactions to implement a pattern of inserting a value into a database only if it does not already exist, I have observed a significant difference between MySQL and PostgreSQL in their definition of the SERIALIZABLE isolation level.

Consider the following table:

CREATE TABLE person (
    person_id INTEGER PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR NOT NULL
);

And the following insertion code, run in concurrently on two connections:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
SELECT person_id FROM person WHERE name = 'Bob Ross';
-- sync point: both transactions should run through here before proceeding to
-- demonstrate the effect

-- 0 results, so we will insert
INSERT INTO person (name) VALUES ('Bob Ross');
SELECT last_insert_id();
COMMIT;

In PostgreSQL (after appropriate translation of the SQL), the effect is as I expect: only one of the transactions can successfully commit. This is consistent with my understanding of SERIALIZABLE as described by PostgreSQL, and other sources quoting from the ANSI standard: there exists a serial execution of the transactions that would produce the same effect. There is no serial execution of these two transactions that returns 0 results for the search and then adds the entry.

In MySQL 5.7, both transactions succeed and there are 2 ‘Bob Ross’ entries in the table. The MySQL documentation defines SERIALIZABLE in terms of prohibiting dirty reads, nonrepeatable reads, and phantom reads; it makes no reference to the existence of a serial execution.

SQLite also correctly pevents double insertion, at least in its default mode, due to its conservative locking strategy.

My question: Is MySQL's behavior in this case correct, or is it violating the SQL standard by allowing these transactions to both succeed? I suspect the answer may hinge on the definition of ‘effect’ — does observing an empty result set from the first SELECT count as an ‘effect’ for the purposes of two serial executions having the same effect?

A couple other comments to help scope this question:

  • I know I could achieve the desired behavior in MySQL by first doing an insert with ON CONFLICT IGNORE, and then doing the select. I am trying to understand why the equivalent standard SQL is not exhibiting the same behavior in both engines.
  • I know that I could probably also fix it by putting a unique constraint on the name field, which would arguably be a better data model anyway. But the core question still remains: why do these transactions both succeed?
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
Michael Ekstrand
  • 28,379
  • 9
  • 61
  • 93
  • How did you perform the test? You should not be able to end up with 2 bob rosses for MySQL either. One transaction will wait at the `insert`-step (if the `select` for the other transaction happend before that), and the other one will be rolled back with a deadlock when it tries to insert (which is fine with "serializable"). MySQL actually differs from other databases in that it does not wait at the (second) `select` already (which would produce less deadlocks but more waits - but this does not actually violate "serializable"). Side note: table- and columnnames do not match the query. – Solarflare Mar 22 '18 at 00:14
  • I opened up two connections, and ran the relevant queries. I brought both connections to the indicated point (before insert), then inserted in 1, inserted in the 2, committed in 1, and committed in 2. MySQL committed both transactions, and a subsequent query found 2 Bob Ross entries. – Michael Ekstrand Mar 22 '18 at 02:26
  • I made you a test on rextester, [part 1](http://rextester.com/NBUU89499) and [part 2](http://rextester.com/LNNE12223). Open both links, run part 1, in the 5 second it runs (it uses `sleep` to pause execution) run part 2. Expected outcome: one should result in a deadlock. If you can verify that this happens for you, the second question is what's different in your setup. I do not know how familiar you are with MySQL (so you might have checked this), but the simplest explanation would be that you used the MyISAM engine, that doesn't support transactions (use `show create table person` to check). – Solarflare Mar 22 '18 at 10:00
  • 1
    I'd say the *query* isn't correct. The typical way to insert a row only if it doesn't exist is an `INSERT .. SELECT 'a','b' WHERE NOT EXISTS` or similar variations. Using a *transaction* to achieve the same effect doesn't scale well – Panagiotis Kanavos Mar 22 '18 at 10:09
  • @MichaelEkstrand Just to double check, are your MySQL tables using InnoDB or MyISAM ?(MyISAM tables are not transactional) – nos Mar 22 '18 at 10:17
  • Yes, I am using InnoDB. – Michael Ekstrand Mar 23 '18 at 16:50

2 Answers2

8

The SQL standard says in chapter 4.35.4 Isolation levels of SQL-transactions (emphasis mine):

The execution of concurrent SQL-transactions at isolation level SERIALIZABLE is guaranteed to be serializable. A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

A little further down, it goes on to confuse the issue:

The isolation level specifies the kind of phenomena that can occur during the execution of concurrent SQL-transactions. The following phenomena are possible:

[skipping definition of P1 (“Dirty read”), P2 (“Non-repeatable read”) and P3 (“Phantom”)]

The four isolation levels guarantee that each SQL-transaction will be executed completely or not at all, and that no updates will be lost. The isolation levels are different with respect to phenomena P1, P2, and P3. Table 8, “SQL-transaction isolation levels and the three phenomena” specifies the phenomena that are possible and not possible for a given isolation level.

+------------------+--------------+--------------+--------------+ 
| Level            | P1           | P2           | P3           |
+------------------+--------------+--------------+--------------+
| READ UNCOMMITTED | Possible     | Possible     | Possible     |
+------------------+--------------+--------------+--------------+
| READ COMMITTED   | Not Possible | Possible     | Possible     |
+------------------+--------------+--------------+--------------+
| REPEATABLE READ  | Not Possible | Not Possible | Possible     |
+------------------+--------------+--------------+--------------+
| SERIALIZABLE     | Not Possible | Not Possible | Not Possible |
+------------------+--------------+--------------+--------------+

NOTE 53 — The exclusion of these phenomena for SQL-transactions executing at isolation level SERIALIZABLE is a consequence of the requirement that such transactions be serializable.

This wording has had the unfortunate consequence that many implementors decided that it is enough to exclude direy reads, non-repeatable reads and phantom reads to correctly implement the SERIALIZABLE isolation level, even though the note clarifies that this is not the definition, but a consequence of the definition.

So I would argue that MySQL is wrong, but it is not alone: Oracle database interprets SERIALIZABLE in the same fashion.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
2

I can't reproduce this in MySQL 5.7. Other transaction always gets an error:

ERROR 1213 (40001): Deadlock found when trying to get lock;

The reason is that the SELECT does not use indexed column in the WHERE-part so it sets s-locks to every row it found, gap-s-lock to every gap between rows found and next-key locks to the positive infinity after the last row found. So in this situation concurrent INSERTs are not possible.

One possible reason to the results you got might be this:

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

It sets isolation level only for the next transaction. If you executed even a single SELECT after that, the isolation level changed back to normal (REPEATABLE READ).

Better to use

SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE;
Croco
  • 326
  • 1
  • 12