I have four tasks (t1, t2, t3, t4) that need to be run in sequence on an item (a URL) every 7 days. I use gearman to run these tasks and a cronjob to send the items to the gearman queue. Each task for an item has a date_run assigned to it. If date_run for t1 is less than 7 days from now that task is sent to the queue. If date_run for t2 is less than t1, that task is sent to the queue... and so on.
The problem I have is if t1 for an item has been queued but has not had time to finish before the cronjob kicks in again. Since the date_run is not updated until the task is complete it will look like the task hasn't been queued and I'll have duplicates of t1 for the same item in the queue.
The solutions I've thought of are:
- Add an unique identifiers to each task and check if they've been queued already
- Just check if the queue is empty or not and don't queue any more tasks until it is
- Add a date_queued to the item table and use this instead of date_run on t1 to schedule the tasks every 7 days
I thought I'd check on stackoverflow first though, if there is a "best way" to solve this problem? I can't seem to get my head around it. :S
Thanks!