9

I'm looking for a way of scheduling tasks where a task starts once several previous tasks have completed.

I have several hundred "collector" processes which collect data from a variety of sources and dump it to a database. Once these have finished collecting (anywhere from 1 second to a few minutes) I want to immediately kick off a bunch of "data-processing" processes to analyse and make sense of the data in the database. When all of these have finished I want a final task to start and send me an email of the summary data.

I'm currently using a Gearman queue and starting the data-processing tasks on timers once I expect the "collector" processes to have completed, but this means that the processing step starts after 10 minutes, even if the collector processes finished after 3 (or worse, have not yet finished).

Ideally I'd be able to specify specific rules like "start process X when process A and (B or C) complete", or "start process Y when 95% of the specified processes have completed or 10 minutes have elapsed".

The processes and dependencies need to be automatically created as it will be run with different parameters each time (ie. I'm not doing an identical calculation each time).

I could write some kind of graph-dependency framework myself using queues and monitors, but it seems like the sort of thing that must have already been solved and I'm looking for anyone who has used something like I describe.

Crashthatch
  • 1,283
  • 2
  • 13
  • 20

2 Answers2

7

"start process X when process A and (B or C) complete"

Why not let worker X launch subworkers A, B and C and wait for them to complete before proceeding? You can have a process X that is both a Gearman worker and a client at the same time.

Goran Rakic
  • 1,789
  • 15
  • 26
  • +1, there is no reason why you can't chain gearman queues. So that client Alpha send job to Gearman Queue 1, this job gets sent to Gearman Worker 1A. Part of the the job makes Gearman Worker 1A act as a gearman client which send a sub job to Gearman Queue 2 which in turn dispatches the job to another worker (2A, or 1B for example) – James Butler Aug 03 '11 at 11:45
0

You have some very peculiar conditions:

  • B or C
  • 95% complete or 10 minutes elapsed

At first I thought your processes were simply asynchronous. In that case you could use something called deferreds and promises. I'm using this a lot in JavaScript when dealing with ajax calls for data. With this you're basically configuring a dependency graph.

But your case is even more complex. Apparently you need an 'or', progress monitoring and timers.

This is all very much un-PHP like stuff. PHP has very poor cron job support, no support for asynchronous tasks and no timers. Why are you doing this in PHP?

Halcyon
  • 57,230
  • 10
  • 89
  • 128
  • The tasks themselves are in PHP for historical reasons- they were initially done as online processes rather than in the background using a queue. Essentially they run as unix scripts from the command line, so I can change them if there's some other language / framework which better supports these complex dependencies? – Crashthatch Jul 30 '11 at 12:40