2

Given the following SQL pseudo-code:

... open transaction ...
SELECT * FROM users WHERE users.id = 5
... some calculation here ...
INSERT INTO users (id) VALUES (5)
... commit transaction ...

Is REPEATABLE_READ isolation guaranteed to prevent concurrent connections from inserting Users[id=5] between the query and insert, or do I need to use SERIALIZABLE isolation for that?

NOTE: I'm looking for a database agnostic solution if possible. If not, please provide empirical evidence from multiple databases so we can work against some sort of industry consensus.

UPDATE: The original question contained an incorrect query that made use of range-locks. Since the purpose of this question is to focus on queries that do not require range-locks, I have updated the question.

Gili
  • 86,244
  • 97
  • 390
  • 689
  • 2
    How can it be database agnostic when different platforms can have different implementations? If you just want a quote from the standard, why don't you read the standard? If you want serious answers here you should probably focus on vendor-specific implementations instead of expecting a "this is how it works on all database platforms" answer... – Aaron Bertrand Jul 31 '13 at 01:40
  • 1
    @AaronBertrand, the specification is virtually unreadable, by humans anyway :) As for vendor-specific answers, I believe in designing against an interface instead of implementation details whenever possible. At the very least, I'm looking for some sort of industry consensus. On a side-note, there are plenty of database-agnostic questions/answers on Stackoverflow. – Gili Jul 31 '13 at 01:50
  • Unfortunately most people work with a single platform or a very limited subset, and working with a platform means you must ditch the classroom and deal with real-world implementation details, so I'm not sure there's a very large population that could even provide such consensus, never mind have any practical use for it. I'm well aware that there are many database-agnostic questions/answers on SO, that doesn't mean they're all good questions with good answers, or weren't asked before such questions were more accepted in general (the site's standards have changed a lot). – Aaron Bertrand Jul 31 '13 at 01:52
  • Funny, *I* can read the specification just fine. I wonder what race that makes me..? – RBarryYoung Aug 27 '13 at 02:55
  • @RBarryYoung, it's so readable that even with a bounty no one has been able to post an answer to this question. I think we both know the answer to this question, but providing a source is not easy. If you have a good answer, please post it. – Gili Aug 28 '13 at 14:16

2 Answers2

3

REPEATABLE_READ doesn't block the table. It guarantees that the transaction see the same rows at any point. Let me explain it by an example:

Time  Transaction 1                    Transaction2
 1    Begin Tx 1                        
 2                                     Begin Tx 2
 4    Select count(*) from my_tab;        
 5                                     Select count(*) from my_tab;
 6    Insert into ... my_tab;
 7    Commit;
 8                                     Select count(*) from my_tab;
 9                                     Insert into ... my_tab;
 10                                     Select count(*) from my_tab;
 11                                    Commit;
 12   Begin Tx3
 13   Select count(*) from my_tab;

If my_tab has 10 rows then the result of the count will be:

  • Time 4 : 10 Rows
  • Time 5 : 10 Rows
  • Time 8 : 10 Rows because table is in repeateble_read mode if the transaction mode is read_commited also will be 10 rows. But if the Tx is set en read_uncommited the number of rows will be 11.
  • Time 10: Since it is in reapeateble read mode ther the count will be 11 rows (ten originals plus one of the insert in the current transaction). If the tx mode is read_commited the number of rows will be 12 (ten originals plus one insert of the tx1 and one insert of the current tx).
  • Time 13: Here the number of rows will be 12 for all transaction modes.

In serializable mode every transaction is executed in single mode, so while a transaction doesn't ends with commit or rollback no one transaction can start. If this mode guarantees the consistency of database have serious performance issues.

EDIT

From SQL-92 Standard (pages 67 and 68) hosted in http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt (obtained from wikipedia):

An SQL-transaction has an isolation level that is READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, or SERIALIZABLE. The isolation level of an SQL-transaction defines the degree to which the opera- tions on SQL-data or schemas in that SQL-transaction are affected by the effects of and can affect operations on SQL-data or schemas in concurrent SQL-transactions. The isolation level of a SQL- transaction is SERIALIZABLE by default. The level can be explicitly set by the .

The execution of concurrent SQL-transactions at isolation level SERIALIZABLE is guaranteed to be serializable. A serializable exe- cution is defined to be an execution of the operations of concur- rently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial exe- cution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

The isolation level specifies the kind of phenomena that can occur during the execution of concurrent SQL-transactions. The following phenomena are possible:

1) P1 ("Dirty read"): SQL-transaction T1 modifies a row. SQL- transaction T2 then reads that row before T1 performs a COMMIT. If T1 then performs a ROLLBACK, T2 will have read a row that was never committed and that may thus be considered to have never existed.

2) P2 ("Non-repeatable read"): SQL-transaction T1 reads a row. SQL- transaction T2 then modifies or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may receive the modified value or discover that the row has been deleted.

3) P3 ("Phantom"): SQL-transaction T1 reads the set of rows N that satisfy some . SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the used by SQL-transaction T1. If SQL-transaction T1 then repeats the initial read with the same , it obtains a different collection of rows.

The four isolation levels guarantee that each SQL-transaction will be executed completely or not at all, and that no updates will be lost. The isolation levels are different with respect to phenomena P1, P2, and P3. Table 9, "SQL-transaction isolation levels and the three phenomena" specifies the phenomena that are possible and not possible for a given isolation level.

           Level                P1         P2         P3
    ---------------------------------------------------------
    | READ UNCOMMITTED     | Possible | Possible | Possible |
    ---------------------------------------------------------
    | READ COMMITTED       | Not      |          |          |
    |                      | Possible | Possible | Possible |
    ---------------------------------------------------------
    | REPEATABLE READ      | Not      | Not      |          |
    |                      | Possible | Possible | Possible |
    ---------------------------------------------------------
    | SERIALIZABLE         | Not      | Not      | Not      |
    |                      | Possible | Possible | Possible |
    ---------------------------------------------------------------
    |                                                             |
    | Note: The exclusion of these phenomena or SQL-transactions  |
    |       executing at isolation level SERIALIZABLE is a        |
    |       consequence of the requirement that such transactions |
    |       consequence of the be serializable.                   |
    ---------------------------------------------------------------

EDIT 2

Ok you're interested in locks and blocks, at this point how the RDMS providers implement this can vary (SQL Server by example is too weird) but for a big picture this can be useful, the following explanation is apply when you are trying to modify/read the same data (rows). There is two types of locks when the RDMS is executing a transaction:

  1. Shared Locks: This kind of lock let other transactions to access to the data. By example many transactions can read the same data simultaneously.
  2. Exclusive Locks: This kind of lock block the resource avoiding other transaction to access it, by example two transactions can't modify at the same time the same data, the transaction that tries to modify the resource must check if there is no exclusive locks over the resource. If there is no exclusive lock over the resource then the transaction gets the exclusive lock and until it release the lock no one transaction can obtain the lock and must wait for release of the exclusive lock.

The second point here is: What is locked ?. Usually there is two types of locks:

  1. Table Level Locks: When a transaction is trying to modify data the transaction get the exclusive lock of all table, other transactions must wait for release of locks by the transaction.
  2. Row Level Locks: When a transaction is trying to modify data the transaction get the exclusive locks of the rows involved in the transaction, other transactions that want to modify the same subset of rows must wait until the first transaction ends.

A note respect to Dead Lock, imagine the following scenario:

Time  Transaction 1                        Transaction 2
 1    Begin Tx 1                        
 2                                         Begin Tx 2
 4    Update table X set... where id = 5        
 5                                         Update table X set ... where id = 5;
 6    Update table X set... where id = 6
 7    Commit;
 8                                         Commit;

If the database is configured at row level lock then Transaction 2 will wait in Time 5 until Time 7 this because Transaction 1 get the exclusive lock first. Now imagine the following example:

Time  Transaction 1                        Transaction 2
 1    Begin Tx 1                        
 2                                         Begin Tx 2
 4    Update table X set... where id = 5        
 5                                         Update table X set ... where id = 6;
 6    Update table X set... where id = 6
 7                                         Update table X set ... where id = 5;
 8    Commit;
 9                                         Commit;

This scenario is know as 'Dead Lock' because in Time 6 the Transaction 1 will wait for the release of lock gained by the Transaction 2 on Time 5 but in the Time 7 the Transaction 2 mus wait for the lock gained by Transaction 1 in Time 4. The different RDBMS manages the dead lock in different ways by example MySQL with InnoDB will raise dead lock exception for Transaction 2 letting the Transaction 1 finish fine.

There is some interesting articles:

Best regards

Gili
  • 86,244
  • 97
  • 390
  • 689
Ernesto Campohermoso
  • 7,213
  • 1
  • 40
  • 51
  • Does your answer hold if I replace the query by `SELECT * FROM users WHERE users.id = 5`? Meaning, does `REPEATABLE_READ` prevent inserts so long as I don't use range locks? – Gili Jul 31 '13 at 01:58
  • Yes, but in this context you need to check the blocking level, depending if it is set to table or level row you can fall on Blocking issues and dead locks. – Ernesto Campohermoso Jul 31 '13 at 02:00
  • Putting aside deadlocks for now, you're saying I need `SERIALIZABLE` to prevent inserts if range-locks are used, and `REPEATABLE_READ` otherwise. Can you provide references for this answer in the form of the SQL standard or empirical evidence from multiple databases? – Gili Jul 31 '13 at 02:02
  • Ok the book with the theory is http://shop.oreilly.com/product/mobile/0636920022343.do maybe ypu must read how oracle, mysql and postgresql manages repeateble read with MVCC. In Sql Server the history is very different – Ernesto Campohermoso Jul 31 '13 at 03:08
  • *‘Time 12: Here the number of rows will be 13 for al transaction modes.’* – Either `13` is a typo or I must have missed something. The initial row count is 10 and each of the two transactions prior to Time 12 inserts one row. That totals to 12 in my book. – Andriy M Jul 31 '13 at 07:57
  • I've updated the question to focus exclusively on queries that do not require range-locks. Please update the question to deal with this case exclusively (remove all unrelated content). – Gili Aug 26 '13 at 17:00
3

No. The REPEATABLE_READ isolation level only guarantees that you will see the same data if you repeat the SELECT within the same transaction, meaning that (SQL) implementations could hide from your transaction insertions performed by other transactions.

The only isolation level able to preclude insertions is SERIALIZABLE because it guarantees that if you have multiple transactions operating concurrently, the results will be as if they had run sequentially in certain order. The specific order in each case is not specified nor reported, but it's guaranteed that there exists one that would have produced the same final results.

One of the conditions of the question is a database-agnostic (and I assume high-level answer), so I will not mention how the different isolation levels work in different DB implementations or how they translate into locks.

Mario Rossi
  • 7,651
  • 27
  • 37
  • http://publib.boulder.ibm.com/infocenter/wxsinfo/v7r0/index.jsp?topic=%2Fcom.ibm.websphere.extremescale.prog.doc%2Fcxsmapentry.html provides an example of `REPEATABLE_READ` returning different results when querying a range of data. See: `Phantom reads are possible when you are using queries or indexes because locks are not acquired for ranges of data` – Gili Sep 30 '14 at 02:39