0

I want to ask you for help with my problem. I have a program that triggers asynchronous computations in parallel and waits in the loop until they are finished.

I am using Postgres as a database where I have created the table computation_status that contains the following data when computations is triggered:

computation finished
COMPUTATION_A null
COMPUTATION_B null
COMPUTATION_C null

Then I am waiting in the loop until all computations are finished. This loop acceps notifications for each computation that is finished and triggers SQL transactions to update its status and check if there is any other computation running. For example:

T1:

BEGIN_TRANSACTION
    update computation_status set finished = NOW() where and computation = 'COMPUTATION_A'
    select exists (select 1 from computation_status where finished is null)
COMMIT

T2:

BEGIN_TRANSACTION
    update computation_status set finished = NOW() where and computation = 'COMPUTATION_B'
    select exists (select 1 from computation_status where finished is null)
COMMIT

T3:

BEGIN_TRANSACTION
    update computation_status set finished = NOW() where and computation = 'COMPUTATION_C'
    select exists (select 1 from computation_status where finished is null)
COMMIT

And when the last computation is finished the program exits the waiting loop.

What level of isolation should I use to avoid these problems? I know I should at least use the READ_COMMITED isolation level to prevent non-repeatable reads, but is that enough? Or is it also possible that phantom reads will occur and I should use REPETABLE_READ? (I'm not sure if an UPDATE is counted as a READ too).

I want to avoid the problem that for example computations A and B will be finished at the same time as the last ones. Then T1 will set A=finished and read that B is not finished and T2 will set B=finished and read that A is not fished and this will cause a problem in my application because it will end up in an infinite loop.

1 Answers1

1

To avoid race conditions here, you have to effectively serialize the transactions.

The only isolation level where that would work reliably would be SERIALIZABLE. However, that incurs a performance penalty, and you have to be ready to repeat transactions in case a serialization error is thrown. If more than one of these transactions are running concurrently, a serialization error will be thrown.

The alternative would be to use locks, but that is not very appealing: using row locks would lead to deadlocks, and using table locks would block autovacuum, which would eventually bring your system down.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263