0

The situation:

PostgresSQL database. Application with SQL Alchemy ORM (not really important). Table with millions of rows.

Hundreds of processes access the database with that table. Each wants to select one row and perform a relatively expensive operation based on its content, then populate other tables and update that row.

The naïve approach which I used is like this:

SELECT * FROM table WHERE status = 'free';

and then right after that:

UPDATE table SET status 'in_process';

Now the problem is that those operations are not atomic, meaning that in the time between the SELECT and the UPDATE, up to 5 other processes can select that row and begin working on it (which, I remind you, is pretty expensive).

Now I know there is SELECT FOR UPDATE which locks rows. But it locks them FOR UPDATE (duh), it doesn't forbid rows from being selected.

So I guess it has to be a pretty common issue, but googling didn't helped much.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
Coykto
  • 1,014
  • 9
  • 12
  • Can you change the ownership of the table to make it unavailable? – Fact Mar 01 '19 at 06:12
  • @Fact i don't want other processes to wait, so there is no way I can lock entire table. I want other processes to select next free row. – Coykto Mar 01 '19 at 06:29
  • IMO it may be handled through a dependency work flow, task a depends on task b scenario. That way anyone looking at the flow will know there is a wait for other task to begin. – Fact Mar 01 '19 at 06:32
  • @Fact well, processes are concurrent, they do not depend on each other. The problem is really just that they sometimes try to select first free row almost at the same time so they all get that row and first one even sets status to "in_process",, but other few processes may already working in parallel on the same row. My concern is not data consistency, but the fact that processes basically waste their time and resources. – Coykto Mar 01 '19 at 07:22

2 Answers2

1

SELECT ... FOR UPDATE is a good technique, because they block each other, so nobody else with the same intent can get your row until your transaction is done.

You may add the SKIP LOCKED clause if you want to ignore rows locked by others.

An alternative that may be attractive to you is

UPDATE atable
SET status = 'in_progress'
WHERE status = 'free'
RETURNING *;
Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263
  • Thanks, but you are not right. Select for update forbids UPDATING same row, but you can SELECT it as many times as you want. And SKIP LOCKED only skips LOCKING, not SELECTing – Coykto Mar 01 '19 at 07:10
  • 3
    I said *everybody with the same intent*, by which I mean everybody who runs a `SELECT ... FOR UPDATE` too. You cannot block concurrent readers, and indeed you don't have to. You control the code, right? Then make sure that everybody who wants to modify the row uses `FOR UPDATE`. – Laurenz Albe Mar 01 '19 at 07:18
  • Again, problem is not with concurrent modifying of rows, but with concurrent selecting of them. Postgress docs say "This prevents them from being locked, modified or deleted by other transactions until the current transaction ends." on `FOR UPDATE`. It does not blocks row from being selected. And `SKIP LOCKED` means - skip locking already locked rows in selection. Not "don't look at locked rows". – Coykto Mar 01 '19 at 07:29
  • 1
    The only possibility to keep readers from your row is to either delete it (and commit the transaction) or to `LOCK` the whole table in `ACCESS EXCLUSIVE` mode. The latter is a bad idea, because it will harm concurrency and performance severely and also block autovacuum, which will lead to table bloat eventually. You should reconsider your requirements. – Laurenz Albe Mar 01 '19 at 07:36
0

Seems like this is a way to solve this:

Using python and sqlalchemy (but it's not nessesary, since i use raw SQL anyway)

from sqlalchemy import text
sql = text("UPDATE table 
            SET status = 'in_process' 
            WHERE column.id = (SELECT column.id 
                               FROM table 
                               WHERE status='free' 
                               AND pg_try_advisory_xact_lock(column.id) 
                               LIMIT 1 FOR UPDATE) 
            RETURNING *"
row = next(iter(engine.execution_options(autocommit=True).execute(sql)))
# Now row is a tuple of values
Coykto
  • 1,014
  • 9
  • 12