0

Hi and Happy New Year :)

I'm using Postgresql and I need a way to enable a worker process to lock a specific row, while that process operates on that row. It is essentially a table of actions which should be executed once and only once. Each process should grab a different row to operate on.

During this 'operation' the worker process will calculate some value and insert into the database in a transaction multiple times (it will alter between calculating and inserting into the database).

I don't know how long each operation will take (it varies) and I need a way to unlock that row, if that process dies/gets killed or the system crashes (so other process can grab that row and finish the operations on it).

As far as I know, Postgresql's row lock last only in one transaction. I was thinking to add some flag, which would indicate if a row is locked or not into the table, but I have trouble figuring out how could I tell if that row is still being operated on, or is it hanging, because the worker process died (in the latter case it should be taken by an other worker process)? (I have a finished flag in that table to signal, that the row is finished/done being processed)

Davs
  • 489
  • 4
  • 12
  • 1
    It sounds like you want some combination of prepared transactions and / or advisory locks. Search for those two terms in the postgresql documentation and see if that helps. As for each process grabbing a row look at this generic method I posted a few days ago and see if anything jumps out at ya: http://stackoverflow.com/questions/20846631/how-to-minimize-likelihood-of-database-contention-during-update/20850343#20850343 – Scott Marlowe Jan 01 '14 at 23:08

1 Answers1

1

Your solution with the flag seems feasible and I think the only thing that is needed is to make the lock expire. Basically the way I would architect the lock is I would write a timestamp when the lock was taken and make it so the process would have to update the lock every so often (i.e. every 30 seconds) while it's still working on the record. If the process dies or otherwise fails to complete the work the lock will expire and other processes can unlock it if more than double the timeout period elapses.

When a process finishes working on a record it would clear the lock flag and mark the record as processed (again another flag).

You will probably want to have two fields: one that would store the timestamp lock flag and another that would indicate which process owns the lock (in case you care about it). I'm assuming that there is some kind of key that can be used to order the records in the table such that the concept of "next action" is meaningful.

You could use a query like this to obtain the next record to process:

 -- find the next available process and "lock" it by updating it's flag
 UPDATE actions_tabe
    SET LockFlag = @timestamp,
        Process = @processname
  WHERE Id IN (SELECT Id
            FROM actions_table
           WHERE LockFlag IS null
             AND IsComplete = '0'
             AND ScheduledTime < now()
           ORDER BY Scheduledtime ASC, Id ASC
           LIMIT 1);

 -- return the Id and Action of the record that was just marked above
 SELECT Id, Action
   FROM actions_table
  WHERE Process = @processname

Sample Fiddle here: http://sqlfiddle.com/#!11/9c120/26/1

Mike Dinescu
  • 54,171
  • 16
  • 118
  • 151
  • for that 'next action' part: There is a column 'when', which indicates when an action should be executed (like a timetable). Each process check the database periodically (every second or two) and searches for a row, which is not finished and the 'when' timestamp is in the past. Beyond that they are simply ordered by id (the order does not matter at all..each row should be executed) – Davs Jan 01 '14 at 20:16
  • @Davs, the `Id` and `when` is all you need. Is the `Id` numeric? – Mike Dinescu Jan 01 '14 at 20:18
  • That idea of periodically update the timestamp (e.g. every 30 sec) is very interesting, altought it probably should be done in a separate thread of that process..BTW I have to use Perl for this and I'm not sure how this could be achieved right now.. – Davs Jan 01 '14 at 20:18
  • yes, the Id is numeric (of type Serial - which is an auto_increment integer in postgres) – Davs Jan 01 '14 at 20:19
  • agreed on the second thread. Obviously the 30 seconds is an arbitrary value. You would want to tailor it to your application. As far as spawing multiple threads in Perl - I'm not sure how to do that. – Mike Dinescu Jan 01 '14 at 20:20
  • Yeah, I'd not rely on Perl's threads (these operations on the rows are quite critical)..Maybe I could fork off a process which would die, when the main worker process finishes it task.. – Davs Jan 01 '14 at 20:21