0

We're using the Ruby gem whenever to manage large batches of import jobs. But what if a file is still being imported when the next cron job occurs?

For example:

12am: whenever starts an import cron job for import.csv

2am: import.csv is still being imported, but the next cron job is scheduled in whenever.

Would whenever skip that file or try to run it again? Any suggestions to make sure it doesn't try to process the same file twice?

Allen Fuller
  • 97
  • 1
  • 10
  • use different filenames, e.g. a timestamp-based name: `import-yyyy-mm-dd-hh-mm-ss.csv`. If your import is still running, then who cares, the filenames will be unique. when the long-running file completes, you look if there's any other files available and start working on those. – Marc B Nov 26 '13 at 14:01
  • Save a log of which files are queued, running, and completed. – Dan Grahn Nov 26 '13 at 14:01
  • Simply using different names for the files isn't a good solution. If the app runs slow or stalls for some reason, then subsequent jobs are likely to run slower or stall for the same reason, which cascades until the machine chokes. Instead, the code itself has to sense whether other instances are running and back-off AND notify someone. If jobs aren't completing on time there should be a known reason, which should be analyzed and then written into the code. – the Tin Man Nov 26 '13 at 15:30

2 Answers2

1

Whenever is merely a frontend for the crontab. Whenever doesn't actually launch any of the processes, it writes a crontab that handles the actual scheduling and launching. Whenever cannot do what you're asking.

The crontab cannot do what you want either. It launches the process and that's it.

You need to implement the checking yourself in the process launched by cron. A common way of doing this could be a lockfile, and I'm sure there are libraries for this (ie http://rubygems.org/gems/lockfile).

Depending on your situation you might be able to create other checks before launching the import.

Jakob S
  • 19,575
  • 3
  • 40
  • 38
1

Well, this isn't really an issue of whenever

However, you could rename the file you want to import when you start processing (12am to 2am is a reasonable amount of time to do that) and move it to an archive directory once you are done processing so there is no confusion.

The next time the task runs it should look for all files that do not match a naming pattern (as already suggested in one of the comments)

And you might want to add an additional task that checks for imports that might have failed (e.g. a file has a naming pattern including the exact time but after a whole day it is still not archived) and either create some kind of notification or just trigger the task again/rename the task so it is picked up again (depending on how well your rollback works)