1

I'm using node.js, node-postgres and Postgres to put together a script to process quite a lot of data from a table. I'm using the cluster module as well, so I'm not stuck with a single thread.

I don't want one of the child processes in the cluster duplicating the processing of another. How can I update the rows I just received from a select query without the possibility of another process or query having also selected the same rows?

I'm assuming my SQL query will look something like:

BEGIN;
SELECT * FROM mytable WHERE ... LIMIT 100;
UPDATE mytable SET status = 'processing' WHERE ...;
COMMIT;

Apologies for my poor knowledge of Postgres and SQL, I've used it once before in a simple PHP web app and never before with node.js.

Thomas Foster
  • 1,303
  • 1
  • 10
  • 23
  • You'd want to write a `FOR UPDATE` using something like this: http://stackoverflow.com/questions/18879584/postgres-select-for-update-in-functions – Anthony Jan 02 '15 at 12:08
  • @Anthony I'll take a look at `FOR UPDATE`, I'm not sure if it does exactly what I'm wanting to do though. – Thomas Foster Jan 02 '15 at 12:21

2 Answers2

1

If you're using multithreaded application you cannot and should not be using "for Update" (in the main thread anyway) what you need to be using is advisory lock. Each thread can query a row or mnany rows, verifying that they're not locked, and then locking them so no other session uses them. It's as simple as this within each thread:

select * from mytab
where pg_try_advisory_lock(mytab.id)
limit 100

at the end be sure to release the locks using pg_advisory_unlock

Joe Love
  • 5,594
  • 2
  • 20
  • 32
  • this has the potential to lock all the rows in the table but only return a subset of them. – Jasen Jan 05 '15 at 01:33
0
BEGIN;
UPDATE mytable SET status = 'processing' WHERE status <> "processing" and id in 
( selecy ID FROM mytable where status <> "processing" limit 100) returning * ;
COMMIT;

There's a chance that's going to fail if some other query was working on the same rows so if you get an error, retry it until you get some data or no rows returned.

if you get zero rows either you're finished or there;s too many other simultaneous proceses like yours.

Jasen
  • 11,837
  • 2
  • 30
  • 48