"Select for update" and update with pessimistic locking

Question

I'm trying to implement pessimistic locking using select for update, as I want other threads to wait until the lock on the selected row is released. The part that I have understood is after going through multiple threads Spring JDBC select for update and various similar threads is it is achievable in case select and update are happening within same method and hence they are part of same transaction.

The issue in my case is I have a JAR for DAO functionality where in a selectforUpdate method is available and a separate update method is available, both method has a finally block which contains

resultSet.close();
statement.close();
connection.close();

Now I'm struggling to find out is there a way in which I can use both the methods from outside of the JAR, maybe by annotating my method with @Transactional annotation and make it work in some way. So that lock is only released once update method has been executed.

javax.sql.DataSource object has been autowired in Repository class, and connection is obtained using javax.sql.DataSource.getConnection(). This is part of the JAR where DAO code resides — ABC, May 23 '21 at 12:49
In that case, provided that the JARs themselves do not create separate `DataSource`s, you should be good with `@Transactional` — crizzis, May 23 '21 at 12:50
@crizzis `@Autowired javax.sql.DataSource` is part of jar itself, apologies in case my above comment didn't clearly specify that. Will still `@Transactional` work? — ABC, May 23 '21 at 12:57
Just to clarify: do you have multiple `DataSource` beans or just one? — crizzis, May 23 '21 at 12:58
@crizzis This jar is being used from multiple microservices and hence each will have their own `DataSource` beans also same microservice might have multiple instances running and each instance will have their own bean:( — ABC, May 23 '21 at 13:08

rzwitserloot · Accepted Answer · 2023-07-24T10:17:22.197

You're making a mistake. Using the wrong tool for the job. Transaction levels and FOR UPDATE has the purpose of ensuring data integrity. Period. It it isn't designed for control flow and if you use it for this, it will bite you in the butt sooner rather than later.

Let me try to explain what SELECT FOR UPDATE is for, so that, when later I tell you that it is most definitely not for what you're trying to do with it, it is easier to follow.

Imagine a bank. Simple enough. The bank has some ATMs out front and a website where you can see your transactions and transfer money to other accounts.

Imagine you (ABC) and I (Reinier) are trying to fleece the bank some. Here is our plan: We set it up so that you have €1000,- in your account and I have nothing.

Then, you log into the website from your phone, and start a transfer, transferring €1000,- to my account. But, while you're doing that, right in the middle, you withdraw €10,- from the ATM.

If the bank messed up their transactions, it's possible you end up with €990,- in your account and I have €1000,- in my account, and we fleeced the bank. This is how that could happen (and if halfway through the example you think: I already know this stuff, I know what FOR UPDATE does! - I'm not so sure you do, read it carefully)

ATM code

startTransaction();
int currentBalance = sql("SELECT balance FROM account WHERE user = ?", abc);
if (currentBalance < requestedWithdrawal) throw new InsufficientFundsEx();
sql("UPDATE account SET balance = ? WHERE user = ?", currentBalance - requestedWithdrawal, abc);
commit();
moneyHopper.spitOut(requestedWithdrawal);

Website code

startTransaction();
int balanceTo = sql("SELECT balance FROM account WHERE user = ?", reinier);
int balanceFrom = sql("SELECT balance FROM account WHERE user = ?", abc);
if (transfer > balanceFrom) throw new InsufficientFundsEx();
sql("UPDATE account SET balance = ? WHERE user = ?", balanceTo + transfer, reinier);
sql("UPDATE account SET balance = ? WHERE user = ?", balanceFrom - transfer, abc);
commit();
controller.notifyTransferSucceeded();

How it can go wrong

The way it goes wrong is if the balanceTo and balanceFrom are 'locked in', then the ATM withdrawal goes through, and then the update SQL statements from the website transaction go through (this wipes out the ATM withdrawal, effectively - whatever the ATM spit out is free money), or if the ATM's balance check locks in, then the transfer goes through, and then the ATM's update goes through (which gives the recipient, i.e. me their €1000,-, and ensures that the ATM code's update, setting your balance to 990, is the last thing that happens, giving us €990,- of free money.

So what's the fix? Hint: Not FOR UPDATE

The fix is to consider what a transaction means. The purpose of transactions is to turn operations into atomic notions. Either both your account is reduced by the transfer amount and mine is raised by the same, or nothing happens.

It's obvious enough with statements that change things (UPDATE and INSERT). It's a bit more wonky when we talk about reading data. Should those reads be considered part of the transaction?

One way to go is to say: No, unless you add FOR UPDATE at the end of it all, in which case, yes - i.e. lock those rows only if FOR UPDATE is applied until the transaction ends.

But that is not the only way to ensure data integrity.

Optimistic locking to the rescue - or rather, to your doom

A much more common way is called MVCC (MultiVersion Concurrency Control) and is far faster. The idea behind MVCC (also called optimistic locking), is to just assume no clashes ever occur. Nothing is ever locked. Instead, [A] all changes made within a transaction are completely invisible to things running in any other transaction until you commit, and [B] when you COMMIT a transaction, the database checks if everything you have done within the span of this transaction still 'holds up' - for example, if you updated a row within this transaction that was also modified by another transaction that has committed already, you get an error when you commit, not when you ran the UPDATE statement.

In this framework, we can still talk about what SELECT even means. This, in java/JDBC, is called the Transaction Isolation Level and is configurable on a DB connection. The best level, the level the bank should be using to avoid this issue, is called the TransactionLevel.SERIALIZABLE. Serializable effectively means everything dirties everything else: If during a transaction you read some data, and when you commit, that same SELECT statement would have produced different results because some other transaction modified something, then the COMMIT just fails.

They fail with a so-called 'RetryException'. This means literally what it says: Just start your transaction over, from the top. It makes sense if you think about that bank example: What WOULD have happened, had the bank done it right and set up serializable transaction isolation level, is that either the ATM machine's transaction or the transfer transaction would get the retryexception. Assuming the bank wrote their code right and they actually do what the exception tells you to (start over), then they would start over, and that includes re-reading the balances out. No cheating of the bank can occur now.

Crucially, in the SERIALIZABLE model, locking NEVER occurs, and FOR UPDATE does not mean anything at all.

Thus, usually, FOR UPDATE does literal stone cold nothing, a complete no-op, depending on how the db is setup.

FOR UPDATE does not mean 'lock other transactions that touch this row'. No matter how much you want it to.

Some DB implementations, or even some combination of DB engine and connection configuration may be implemented in that fashion, but that is an extremely finicky setup, and your app should include documentation that strongly recommends the operator to never change the db settings, never switch db engines, never update the db engine, never update the JDBC driver, and never mess with the connection settings.

That's the kind of silly caveat you really, really don't want to put on your code.

The solution is to stop buttering your toast with that chainsaw. Even if you think you can manage to get some butter on that toast with it, it's just not what it was made for, like at all, and we're all just waiting until you lose a thumb here. Just stop doing it. Get a butterknife, please.

If you want to have one thread wait for another, don't use the database, use a lock object. If you want to have one process wait for another, don't use the database, don't use a lock object (you can't; processes don't share memory); use a file. the new java file IO has an option to make a file atomically (meaning, if the file already exists, throw an exception, otherwise make the file, and do so atomically, meaning if two processes both run this 'create atomically new file' code, you have a guarantee that one succeeds and one throws).

If you want data integrity and that's the only reason you wanted pessimistic locking in the first place, stop thinking that way - it's the DBs job, not your job, to guarantee data integrity. MVCC/Optimistic locking DBs guarantee that the bank will never get fleeced no matter how hard you try with the shenanigans at the top of this answer and nevertheless, pessimistic locking just isn't involved.

JDBC itself sucks (intentionally, a bit too much to get into) for 'end use' like what you are doing here. Get yourself an abstraction that makes it nice such as JDBI or JOOQ. These tools also have the only proper way to interact with databases, which is that all DB code must be in a lambda. That's because you don't want to manually handle those retry exceptions, you want your DB access framework to take care of it. This is what the bank code should really look like:

dbAccess.run(db -> {
    int balance = db.sql("SELECT balance FROM account WHERE user =?", abc);
    if (balance < requested) throw new InsufficientBalanceEx();
    db.update("UPDATE account SET balance = ? WHERE user = ?", balance - requested, abc);
    return requested;
};

This way, the 'framework' (the code behind that run method) can catch the retryex and just rerun the lambda as often as it needs to. rerunning is tricky - if two threads on a server both cause the other to retry, which is not that hard to do, then you can get into an endless loop where they both restart and both again cause the other to retry, at infinitum. The solution is literally dicethrowing. When retrying, you should roll a random number and wait that many milliseconds, and for every further retry, the range on which you're rolling should increase. If this sounds dumb to you, know that you're currently using it: It's how Ethernet works, too (ethernet uses randomized backoff when collisions occur on the wire). Ethernet won, token ring lost. It's the exact same principle at work (token ring is pessimistic locking, ethernet is optimistic 'eh just try it and detect if it went wrong, then just redo it, with some randomized exponential backoff sprinkled in to ensure you don't get 2 systems in lock-step forever screwing up the other's attempt).

Thanks for such a detailed answer. Really appreciate it, I understood it and really this makes sense. Just one more thing won't retry lead to unnecessary performance issues. Also If you want to have one process wait for another, don't use the database, don't use a lock object (you can't; processes don't share memory); **use a file**. How can we achieve it in case multiple infra are using the same jar and we are not currently using any cloud services like AWS S3 — ABC, May 23 '21 at 14:19
I'm truly baffled at two claims that your answer does. 'A much more common way is called MVCC (MultiVersion Concurrency Control) and is **far** faster.' Well, I mean, it's *sometimes* faster, especially in low-contention environments. To say that it's always faster and by far at that is quite an over-statement, though. Also, I'm confused by what you meant by '**Crucially, in the SERIALIZABLE model, locking NEVER occurs**'. Out of all the available isolation levels (which is just another name for 'automatic locking policies'), `SERIALIZABLE` is actually the one that does the most locking. — crizzis, May 23 '21 at 15:30
_Just one more thing won't retry lead to unnecessary performance issues_ On the contrary. Retry leads to vast performance improvements: Yes, you have to execute the occasional retry. You trade for __zero pessimistic locks throughout the codebase__. That's a remarkable tradeoff. Like ethernet and token ring, turns out that is orders of magnitude faster. Remember, to get serializable transaction isolation without optimistic locking, a single select __has to lock the entire table__. optimistic locking solves that. — rzwitserloot, May 23 '21 at 19:50
@crizzis MVCC isn't the same thing as serializable transaction level. MVCC Is a way databases work, and many do (including postgres). With MVCC, you _can_ implement serializable transaction level in an efficient fashion, using optimistic locking. _`SERIALIZABLE` is actually the one that does the most locking._ - this is incorrect. Optimistic locking (i.e. no locking, just retries) lets you have SERIALIZABLE without locks. Out of context, sure, that statement doesn't make sense, but it's in the middle of an entire section on optimistic locking and an explanation of the retry model. — rzwitserloot, May 23 '21 at 19:52
@rzwitserloot I never claimed MVCC was the same thing as serializable isolation level. What I meant was that while optimistic locking may increase performance in some scenarios, it introduces the risk of having to re-run the entire critical section. So, you cannot really claim it is universally more efficient — crizzis, May 23 '21 at 21:53
No, and I didn't - it's the only transaction level that lets a bank not lose money given the 2 code snippets - and yet on some impls, `ON UPDATE` wouldn't cause any pessimistic locking to occur at all. — rzwitserloot, May 23 '21 at 23:16
@rzwitserloot _ON UPDATE wouldn't cause any pessimistic locking to occur at all._ **Pessimistic Locking** by definition is when you lock the record for your exclusive use until you have finished with it. `ON UPDATE` in a way does the same thing right then why above statement. **Possibility is I am not looking from the point of view you are looking into it** Please can you explain — ABC, May 24 '21 at 15:24
@ABC If a DB uses a locking-based implementation to ensure the various guarantees each of the 4 transaction levels provides, then depending on which TL you chose, `FOR UPDATE` very much would cause a write lock to be applied to the selected row(s), preventing any other query from reading it or writing to it until that transaction completes. However, if a DB engine is using MVCC/Retry to ensure these guarantees, it doesn't have to lock anything, not even at TL.SERIALIZABLE. The point is, OP wanted to use the locking aspect for control flow, and I explain why you can't: Locking is just one strat — rzwitserloot, May 24 '21 at 18:12
@rzwitserloot: In a lock based database, locks are normally applied on the record; not on the whole table. Only when there are too many locks, the database implementation could upgrade to more course grained lock. — pveentjer, Jun 04 '21 at 04:49
@zwitserloot 'Crucially, in the SERIALIZABLE model, locking NEVER occurs, and FOR UPDATE does not mean anything at all' This is not correct; with an MVCC based database, a for update acquires a pessimistic lock when the SELECT is executed. This way you reduce the need for retries. — pveentjer, Jun 04 '21 at 04:53
@zwitserloot 'If you want to have one thread wait for another, don't use the database, use a lock object.' If you are already using a database, use the database for synchronization. Do not add more tools into the mixture. — pveentjer, Jun 04 '21 at 04:54
@zwitserloot 'MVCC/Optimistic locking DBs guarantee that the bank will never get fleeced no matter how hard you try with the shenanigans'. Look for write skew. A problem Oracle used to suffer from; not sure if they still suffer from it and some others like PostgreSQL solved it long time ago. But some MVCC databases can get behavior of transactions that can't be explained by serial execution of the transactions. — pveentjer, Jun 04 '21 at 04:57