MariaDB Transaction Isolation Levels

Question

I am facing a problem, and I am trying to wrap my head around the isolation levels. To understand these isolation levels, I've read the documentation found on the MariaDB website.

The base isolation level used by InnoDB tables is stated to be REPEATABLE_READ.

Consider the following problem. I have the following two tables structure:

/** tableA **/
id INT AUTO_INCREMENT PRIMARY_KEY

/** tableB **/
id INT
claimedBy INT NULLABLE

and also have a function, which pseudocode looks like this

/** should create a new row in "tableA", and update the rows from "tableB" which ids match the ones in the array from the input parameter, and that are null, to the id of this new row - in case the number of updated rows does not match the length of the input array, it should roll back everything **/
claim(array what) {
   - starts transaction
   - inserts a row into "tableA" and retrieve's it's id, storing it inside "variableA"
   - updates "claimedBy" to "variableA" on all rows from "tableB" that have "claimedBy" set to null and have "id" that is in "what"
   - counts the number of rows from "tableB", where "claimedBy" equals to "variableA"
     - if the count does not match the length of the "what" parameter, rolls back the transaction
     - if the count matches, commits the transaction 
}

My questions, which would help me understand isolation levels more concretly are the following:

In case two separate calls are made concurrently to this function which both have "what" arrays that intersect at any point, if I understand correctly, REPEATABLE_READ would prevent my data to become corrupted, because all the rows will be locked in the table as soon as the first update begins to perform, thus whichever function calls update is executed second, will be completely rolled back. Am I right in this? Based on the example on the official documentation it would seem like that rows are checked for the where condition and locked one-by-one. Is this the case? If yes, is it possible, that on concurrent calls to the function, both queries get rolled back? Or worse, is it possible that a deadlock would occur here?
In this concrete example, I could safely decrease the isolation level for the transaction to READ_COMMITED, which would also prevent the data corruption, but would not retain the locks for the duration of the update for rows that are not affected by the update, am I correct in this?
The lock retaining for manual TRANSACTIONS in MariaDB are for the duration of the query operation that create these locks, or for the duration of the complete transaction operation? (ie, until the transaction is either rolled back or commitd?)

FOLLOWUP QUESTION

Am I mistaken, that if using READ_COMMITED isolation, the following two concurrent calls could execute at the same time (without one, waiting for the lock of the other to be released), but not if REPEATABLE_READ isolation was used?

/** Session #1 **/
claim(array(1,2,3));

/** Session #2 **/
claim(array(4,5,6));

since claimedBy is unique, why would the count ever not match? I think you are overthinking this. — ysth, Feb 06 '22 at 19:13
use an auto_imcrent, that is is unique and so atomic as i can be, locking the table manually, will finally produce deadlock — nbk, Feb 06 '22 at 19:15
@ysth in the above example table, `claimedBy` does not have to be unique - i'll update the question I think I see where the confusion comes from. — Adam Baranyai, Feb 06 '22 at 19:22
@nbk where should I use an auto increment? I am worried by the state of "tableB" - after the function from the example returns, I want either, a new row in "tableA", connected to ALL the rows found based on the array that was supplied to the function in "tableB", or to have no change at all. I don't see how an auto increment should solve this — Adam Baranyai, Feb 06 '22 at 19:27
variableA absolutely is unique; auto increment guarantees that (and will leave gaps if generated ids are rolled back). you have nothing to worry about here, and do not need to even do the count — ysth, Feb 06 '22 at 19:33
@ysth yes, it will be unique, but for example, if I call `claim(1,2,3)`, and `claim(2,3,4)`, both instances of the claim will create separate `variableA`-s (ex: 1 and 2). Without isolation, if I am not mistaken, I could end up with a `tableB` in which rows are claimed in the following way: (1=1, 2=2, 3=1, 4=2) which would be considered corrupted data on my part. I would need either (1=1, 2=1, 3=1, 4=null) or (1=null, 2=2, 3=2, 4=2) for the integrity of `tableB` - thus, without the count I am in no way knowing that the `claim` function ran correctly or not — Adam Baranyai, Feb 06 '22 at 19:36
my `claim` function should either assign all rows to the newly inserted id which belong to the input parameters, or assign no rows at all. — Adam Baranyai, Feb 06 '22 at 19:38
@AdamBaranyai the table where you insert the row, in the update you use https://mariadb.com/kb/en/last_insert_id/ — nbk, Feb 06 '22 at 20:23

Bill Karwin · Accepted Answer · 2022-02-07T00:28:28.110

There's very little difference between REPEATABLE-READ and READ-COMMITTED in the scenario you describe.

The same locks are acquired in both cases. Locks are always held until the end of the transaction.

REPEATABLE-READ queries may also acquire gap locks to prevent new rows inserted, if those rows would change the result of some SELECT query. The MySQL manual explains gap locks better, and it works the same way in MariaDB: https://dev.mysql.com/doc/refman/8.0/en/innodb-locking.html#innodb-gap-locks Regardless, I don't think this will be an issue.

I don't think you're at risk of a deadlock in the scenario you describe. Your UPDATE should lock all the rows examined. Rows are not locked one by one; the lock request is atomic. That is, if any of the set of examined rows cannot be locked because another session already has some or all of them locked, then the new lock request waits.

Once your UPDATE succeeds (locks are acquired and then the rows are updated), then your session has them locked and keeps them locked until the end of the transaction. Subsequently doing a count would reference only locked rows, so there's no way another session could slip in and cause a deadlock.

One subtle point about locking that you may not notice in the documentation: locking SQL statements act as if they are run in READ-COMMITTED mode, even if your transaction is REPEATABLE-READ. In other words, locks are acquired on the most recently committed version of a row, even if a non-locking SELECT query would not read the most recent version of that row. This is surprising to some programmers.

Re your comments:

I wrote a demo of the locking/nonlocking odd behavior in my answer here: How MVCC works with Lock in MySql?

Regarding releasing locks, yes, that's correct, in READ-COMMITTED mode, a lock is released if your UPDATE doesn't make any net change to the row. That is, if your update sets a column to the value that it already had. But in your case, you are changing values on rows that match your conditions. You specifically select for rows where the claimedBy is NULL, and you are setting that column to a non-NULL value.

Regarding your followup question, you don't have an index on the claimedBy column, so your query will have to at least examine all the rows. In READ-COMMITTED mode, it will be able to release the lock pretty promptly on rows that don't match the search conditions. But it would be better to have an index on claimedBy so it is able to examine only rows that match the condition. I would think it's better (if only by a slight margin) to avoid locking extra rows, instead of locking them and releasing the locks.

I don't think that transaction isolation is such an important factor in performance optimization. Choosing indexes to narrow down the set of examined rows is a much better strategy in most cases.

I'm trying to understand a bit better the last paragraph as I might be one of those programmers :). Could you expand it a bit more or add a link to something that might explain that scenario in more detail? - Thanks!!!! — Augusto, Feb 06 '22 at 20:17
Hmmm - then I think I still don't understand something - based on the example in the documentation page which I've referenced, I am seeing that when using `READ-COMMITTED`, some rows are immediately unlocked after the check if they are not needed by the current transaction - based on this, I was assuming, that going with `READ-COMMITED` isolation would give me an edge, performance wise, as concurrent calls to the function when there is no overlap or intersection between the input parameter arrays, could well run concurrently, without one waiting for the others lock - am I wrong in this? — Adam Baranyai, Feb 06 '22 at 20:45
I posted a followup quesiton to illustrate my previous comment with a concrete situation, where I am assuming I would gain a performance benefit with `READ_COMMITED` instead of `REPEATABLE_READ` — Adam Baranyai, Feb 06 '22 at 20:50
I'm confused by this part of the answer "REPEATABLE-READ queries may also acquire gap locks ... ". The MariaDB documentation is confusing (to me): https://mariadb.com/kb/en/mariadb-transactions-and-isolation-levels-for-sql-server-users "REPEATABLE READ does not acquire share locks on all read rows, nor a range lock on the missing values that match a WHERE clause." Is this saying SQL doesn't do X but Maria DB does? If you are used to READ COMMITTED limitations, why would you ever by default use REPEATABLE READ with the locking problems it can cause? — GrahamB, Oct 02 '22 at 19:52
@GrahamB, You would use REPEATABLE READ if you need *non-locking* reads (e.g. plain SELECT queries) to obey repeatable read semantics. That is, if you need subsequent queries in the same transaction to view the same state of data, regardless of other changes going on concurrently. — Bill Karwin, Oct 02 '22 at 20:42
But a more realistic answer is that REPEATABLE READ is the default transaction isolation level, and 95% of developers don't understand transactions enough to even realize it's an option to change it. — Bill Karwin, Oct 02 '22 at 20:42

MariaDB Transaction Isolation Levels

1 Answers1