When using SERIALIZABLE
transactions to implement a pattern of inserting a value into a database only if it does not already exist, I have observed a significant difference between MySQL and PostgreSQL in their definition of the SERIALIZABLE
isolation level.
Consider the following table:
CREATE TABLE person (
person_id INTEGER PRIMARY KEY AUTO_INCREMENT,
name VARCHAR NOT NULL
);
And the following insertion code, run in concurrently on two connections:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
START TRANSACTION;
SELECT person_id FROM person WHERE name = 'Bob Ross';
-- sync point: both transactions should run through here before proceeding to
-- demonstrate the effect
-- 0 results, so we will insert
INSERT INTO person (name) VALUES ('Bob Ross');
SELECT last_insert_id();
COMMIT;
In PostgreSQL (after appropriate translation of the SQL), the effect is as I expect: only one of the transactions can successfully commit. This is consistent with my understanding of SERIALIZABLE as described by PostgreSQL, and other sources quoting from the ANSI standard: there exists a serial execution of the transactions that would produce the same effect. There is no serial execution of these two transactions that returns 0 results for the search and then adds the entry.
In MySQL 5.7, both transactions succeed and there are 2 ‘Bob Ross’ entries in the table. The MySQL documentation defines SERIALIZABLE
in terms of prohibiting dirty reads, nonrepeatable reads, and phantom reads; it makes no reference to the existence of a serial execution.
SQLite also correctly pevents double insertion, at least in its default mode, due to its conservative locking strategy.
My question: Is MySQL's behavior in this case correct, or is it violating the SQL standard by allowing these transactions to both succeed? I suspect the answer may hinge on the definition of ‘effect’ — does observing an empty result set from the first SELECT
count as an ‘effect’ for the purposes of two serial executions having the same effect?
A couple other comments to help scope this question:
- I know I could achieve the desired behavior in MySQL by first doing an insert with
ON CONFLICT IGNORE
, and then doing the select. I am trying to understand why the equivalent standard SQL is not exhibiting the same behavior in both engines. - I know that I could probably also fix it by putting a unique constraint on the
name
field, which would arguably be a better data model anyway. But the core question still remains: why do these transactions both succeed?