6

I am writing a Snakefile for a snakemake workflow. As part of my workflow I need to check whether a set of records in a database has changed, and if they have re-download them.

My thought was to write a rule that checks the database timestamp and writes it to an output file. And use the timestamp file as an input into my download rule. The problem is once the timestamp file is written that timestamp rule will never run again, and hence the timestamp will never be updated.

Is there a way to make this rule run every time. (I know I can force it from the shell, but I would like to specify it in the Snakefile) Or, is there a better way to handle this?

interjay
  • 107,303
  • 21
  • 270
  • 254
Chris
  • 1,313
  • 2
  • 13
  • 26

2 Answers2

7

Any code you add to a Snakefile outside of a rule or function definition will be run at startup just like a regular Python script, so you don't need an external shell script. You can implement the logic you want in Python right in the Snakefile, making use of the shell() function if you need it.

One caveat would be that if you tried to run your workflow on a cluster, the code would be run each time for each cluster job submitted. A crude but effective way to avoid this is to guard it with a check like this:

if '--nolock' not in sys.argv:
    if check_database_for_updates():
        os.utime('touch.file')

Then set touch.file as a proxy input to your rule that reads from the database. Does that make sense?

TIM

  • 1
    Since recently, the code in the Snakefile gets executed for every single job, not only when you are on a cluster. So the recipe above is also relevant for single-machine setups. – j08lue Aug 21 '18 at 06:30
4

Since v3.6.0, onstart handler allows to always execute something before the workflow starts.

Snakemake 3.6.0 adds an onstart handler, that will be executed before the workflow starts. Note that dry-runs do not trigger any of the handlers.

It's unfortunate that onstart doesn't get triggered during dry-runs.

On similar note, onsuccess and onerrorhandlers can be used to trigger something to be executed depending on workflow's success and error, respectively.

Manavalan Gajapathy
  • 3,900
  • 2
  • 20
  • 43