We are seeing the following problem on a postgres database, which runs two processes.
Process #1 examines a table for new records, by do a SELECT on the table. No transaction is in play at this point.
It then iterates through the records one record at a time, doing some non-database related work on each and then does an update on that single row inside a serialized isolation level transaction. It sets a flag in the record to indicate that the row has been processed and then commits the transaction
It then moves on to the next record in the Select and repeats. As mentioned each update is wrapped by a transaction.
While this is going on process #2 (which is responsible for the next phase of processing) detects records in the table with the flag set, using its own SELECT which is not in a transaction.
It then performs an UPDATE inside a transaction on all of these records at once using 'WHERE id IN [list of ids]'
The fact that these records are picked up in process #2s SELECT indicates that process#1 has updated them AND committed.
However occasionally we are getting a 40001 transaction locking error in process #2 when it does the update, suggesting that one of the ids is still locked. Which makes no sense.
The time between the failure and the last of the ids being committed (and therefore presumably unlocked) by process#1 is normally around 800ms. However this does not mean that process #1 has finished its run through the records in its SELECT, there may be more to do. But the ones it has done at this point are the ones picked up by process #2
No other processes are running on the database.
Can anyone suggest what the problem might be. Why are locks still in place after a commit.
Thanks for reading