4

maybe you can shed light on something for me here:

DB = MySQL 5.7

Storage engine: InnoDB

Isolation level: Repeatable Read

Following table:

---------------
|   MyTable   |
---------------
| PK | Concur |
---------------
| 3  |   2    |
---------------

I have no transaction going at this point in time and I select this record like

SELECT * FROM MyTable WHERE PK = 3

and store the result in my program.

I start now a DB transaction. An outside process increments Concur for the record with PK = 3 from 2 to 3 after my transaction has started.

I have not yet read again from that table inside my transaction.

I issue the following query from inside my transaction:

UPDATE MyTable SET Concur = 3 WHERE PK = 3 AND Concur = 2

This will succeed with 0 records affected. So clearly, it evaluates against the data that has been altered AFTER my transaction has started. Still in the transaction I subsequently query:

SELECT * FROM MyTable WHERE PK = 3

which will return me the record with PK = 3 and Concur = 2 which are the values from before the transaction.

Why do SELECT and UPDATE ... WHERE behave differently, what am I missing?

I would have expected that the UPDATE ... WHERE statement either fails directly instead of succeeding with 0 records affected, or alternatively it succeeds there with 1 record affected and then blows at the COMMIT afterwards, but not this mix and match.

Any insight here?

Frank J
  • 1,666
  • 13
  • 19

3 Answers3

6

https://dev.mysql.com/doc/refman/8.0/en/innodb-consistent-read.html

A consistent read means that InnoDB uses multi-versioning to present to a query a snapshot of the database at a point in time. The query sees the changes made by transactions that committed before that point of time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query sees the changes made by earlier statements within the same transaction. This exception causes the following anomaly: If you update some rows in a table, a SELECT sees the latest version of the updated rows, but it might also see older versions of any rows. If other sessions simultaneously update the same table, the anomaly means that you might see the table in a state that never existed in the database.

The important condition is if you change the rows, your consistent read is "refreshed" so it includes the change you just made.

But if you UPDATE, it's always made to the most recent version of the row, not the version your transaction's consistent read can view. Therefore your UPDATE may have no net effect, if another transaction has already made that change. That's the situation you observed.

Therefore your transaction issued an UPDATE but didn't change the row.

This is perhaps not how you would like InnoDB to behave, but it is nevertheless how it does behave.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thank's that's what I was looking for - UPDATE...WHERE behaves by design equivalent to READ COMMITTED behaves for SELECTs. That's unfortunate. I don't actually want to lock the row and I can't switch the isolation level - so to get a READ COMMITTED behavior temporarily from a SELECT I suppose I would have to use a separate connection outside of this current transaction, unless you can think of any other way maybe? – Frank J Dec 12 '19 at 13:59
  • Set READ COMMITTED as your isolation level before you start your transaction? – Bill Karwin Dec 12 '19 at 17:26
  • For what it's worth, any locking SELECT will also get the same behavior. Any locking statement acts like READ COMMITTED, so you can use `SELECT ... LOCK IN SHARE MODE` (this is changed to `SELECT ... FOR SHARE` in MySQL 8.0). – Bill Karwin Dec 12 '19 at 17:27
  • The READ COMMITTED would be ideal, but I can't do that currently for other reasons I don't have influence on. I wish UPDATE and SELECT wouldn't behave in this case like they have different isolation levels. And I on purpose don't lock in the beginning because until I get to the update it can be several minutes. We are using that row as signaling and concurrency mechanism. Seems to be the wrong tool for the job unfortunately – Frank J Dec 13 '19 at 14:02
  • Yeah, using transaction-scoped data is the wrong tool for that task. You need some kind of global semaphore that is not subject to transaction isolation. I've seen systems that use a key in a Redis instance, for example, so all your app instances and threads can access it globally. – Bill Karwin Dec 13 '19 at 17:36
  • We may not be in the same building, not talking about the same process. That's why we wanted to use the DB for signaling / concurrency. – Frank J Dec 13 '19 at 19:31
  • Well, regardless, you need to use something that is not limited by transaction isolation. Being in the same building or not is not the point. If the two clients can access the same database, they can access the same Redis instance, or the same instance of a lock server or any other shared technology. If you must use MySQL, then perhaps use a MyISAM table or [GET_LOCK()](https://dev.mysql.com/doc/refman/5.7/en/locking-functions.html#function_get-lock) – Bill Karwin Dec 13 '19 at 20:44
  • I hear you, back to the drawing board. Thank you for your help! – Frank J Dec 16 '19 at 15:00
1

Signal your intent with FOR UPDATE:

BEGIN;
SELECT ... FOR UPDATE;
...
(no other thread can change that row until you `COMMIT` or `ROLLBACK`)
...
COMMIT;

On the other hand...

BEGIN;
...
(At this point another thread modifies the row...)
...
SELECT ... FOR UPDATE;  -- you are blocked until they COMMIT or ROLLBACK
Rick James
  • 135,179
  • 13
  • 127
  • 222
  • I looked into that, but I actually do not want to lock the row at this point because the process in the transaction can actually take several minutes before we get to the UPDATE...WHERE part, and I don't want to block anything that long. – Frank J Dec 12 '19 at 14:02
  • @FrankJ In _that_ case, you need a column that you change in a transaction, go do the stuff, then come back in _another_ transaction to undo the flag. And all relevant processes must honor that flag. – Rick James Dec 12 '19 at 16:49
  • Ya that was my next logical thought, unfortunately we get into race conditions there then. – Frank J Dec 13 '19 at 14:03
  • @FrankJ - Ant that's why I said "must honor that flag". That is, it becomes the application's problem. (A long-running 'transaction' is not a good idea in the database.) – Rick James Dec 13 '19 at 20:18
  • It is an automation process that can take minutes and in case for very large clients half an hour and must fail / rolled back wholly. So I do need the transaction. I can sidestep this with a secondary DB connection to reduce the time for the race condition. This is unfortunately a pretty complex scenario and a bunch of stuff is "fixed" that I have no influence on. Thank you for your help though. I'll have to go back to the drawing board. – Frank J Dec 16 '19 at 14:59
0

To get affected by the Repeatable Read isolation level, you must be inside the same transaction. This means that your two selects must be inside a transaction, so they are not afected by wherever you change in your DB from an external transaction.

So, as you say, I have no transaction going at this point in time and I select this record like SELECT * FROM MyTable WHERE PK = 3 and store the result in my program. you are making a transaction with just one statement. After that, you start a transaction with your Update.

What you should do is

 START TRANSACTION
      SELECT * FROM MyTable WHERE PK = 3
      ------                                        START TRANSACTION
      -----                                             UPDATE MyTable SET Concur = 3 WHERE PK = 3 AND Concur = 2                                            
      -------                                       END TRANSACTTION
      -------
      -------
      SELECT * FROM MyTable WHERE PK = 3
 END TRANSACTION

In the left side, you have your select transaction, and in the right side your update. The left transaction is not afected by the update in this way.

nacho
  • 5,280
  • 2
  • 25
  • 34
  • I understand this, but in my question I'm asking about the difference in a UPDATE..WHERE vs SELECT. UPDATE..WHERE behaves like READ COMMITTED in this case. Alternatively, if the read from the UPDATE...WHERE actually creates the snapshot because it is the first read on that table after the transaction has started then the subsequent SELECT should also return the new value. Essentially i'm asking about the seeming discrepancy of the isolation level between UPDATE...WHERE and SELECT or if they are the same the inconsistency of when the snapshot is created across the two. I hope that makes sense. – Frank J Dec 11 '19 at 17:00