0

I have a rather complex caching and invalidation setup where I need to recalculate a lot of data if a specific dataset changes.

Basically if one specific entry changes, this might create up to 15 jobs recalculating stuff. While those jobs are running, it could happen that another of the main entries changes. Again creating several jobs. (This could happen simultaneous)

What I need to achieve is to aggregate the results after the jobs are run. And it would probably make sense to do this aggregation only once.

So what I need to do is run a single job, only once after all those other jobs are finished.

(BTW: I am unsing BCCResqueuBundle)

m0c
  • 2,180
  • 26
  • 45
  • Sounds correct, but what is your question? – Tw Bert Mar 21 '14 at 10:44
  • How to achieve this. How to schedule a job when ALL others are finished. – m0c Mar 21 '14 at 15:33
  • It sounds to me you could use a reference counter combined with a lock. The lock: Either 1 or many async workers are running, or the aggregate job is running. The reference counter tells the nr of async workers. The aggregator claims the lock only when ref counter is zero. – Tw Bert Mar 21 '14 at 18:24
  • Ok I think I got it, but it might be sufficient to just have a counter on the async jobs, and after one job is finished I decrease the counter and check then for the number of still running jobs, and if there are no other ones running I schedule the aggregation job. I am not sure if I need the lock. – m0c Mar 22 '14 at 10:29
  • Sounds fine. You don't need the lock if there's no risk of the worker threads starting when the aggregate job is still busy. I didn't post this as an answer btw, because I didn't know 100% sure if that was what you were asking. Your question wasn't very detailed. Plz close the question afterwards and/or post an answer of yourself (which you accept). – Tw Bert Mar 22 '14 at 13:26
  • you can also post your approach with a counter. This is how I will now go forward. – m0c Mar 22 '14 at 13:30

1 Answers1

0

You could use a reference counter, and if needed combine it with a lock.

The reference counter tells the nr of async workers. Increment when started, decrement when finished or failed.

The lock: Either 1 or many async workers are running, or the aggregate job is running. If you use careful scheduling, you don't need a lock.

Tw Bert
  • 3,659
  • 20
  • 28
  • I just came across an issue. I am increasing the counter within the job, but it might be that after the job is done, NO other job is running, but there might be some jobs queued, I am not sure how to find specific enqueued jobs. I could increase the counter not during job execution, but when I enqueue them. I am just afraid, that in the case when a job fails, my aggregation job will never run. – m0c Mar 22 '14 at 13:54
  • Yes, that's why I posted _decrement when finished or_ __failed__ above. Doesn't the job scheduler tell you if a job failed? Exceptional flows should be accounted for, but requires different logic. You could write a (scheduled) watchdog process for this. You also could put everything together in one job, where you first do the async work (with threads or greenlets), and after that the aggregate. I don't know how distributed you solution has to be. – Tw Bert Mar 22 '14 at 18:30
  • Afterthought: you also could put a TTL on your reference counter (if there is no cleanup to do after a failed worker process). – Tw Bert Mar 22 '14 at 18:32
  • TTL sounds like a very easy solution, probably not 100% robust, but sounds like a good starting point. If I have the time, I could also introduce a failure handler that decreases the counter. – m0c Mar 23 '14 at 16:32