6

Is there a way to prevent snakemake from making a directory for output that doesn't exist yet?

fimo from the MEME suite annoyingly fails at the end of a run if the directory already exists.

My workaround is to give fimo a different directory to output than the one I specify in output but was wondering if there is a more straightforward/elegant solution.

Example given:

    rule generate_scan:
        output:
            PROJECT_BASE + '/results/fimo_scan/fimo.txt'
        params:
            genome = '/home/hjp/ImmuneProject/hg19_reference/hg19.fa',
            motif_database = PROJECT_BASE + '/motif_databases/HUMAN/HOCOMOCOv10_HUMAN_mono_meme_format.meme',
            tmp = 'results/tmp_fimo'
        shell:
            '/home/hjp/meme/bin/fimo'
            ' -o {params.tmp}'
            ' --motif GATA2_HUMAN.H10MO.A'
            ' {params.motif_database}'
            ' {params.genome}'
            ' && '
            'mv {params.tmp}/* {PROJECT_BASE}/results/fimo_scan/'
            ' && '
            'rm -rf {params.tmp}'

Thanks in advance!

Harold
  • 293
  • 3
  • 10

2 Answers2

8

Currently, you can't prevent this directly in Snakemake (most tools will rather complain the other way round). However, I'd just prepend the actual invocation of fimo with an rm -r on the output directory.

Johannes Köster
  • 1,809
  • 6
  • 8
  • Has there been any change, or is this still the only option? – soungalo Jun 14 '21 at 09:32
  • 1
    No change in this regard yet. I also think adding some kind of syntax for this to Snakemake would just make the corresponding rule more crowded. Better to "hide" this detail in the shell command via the rm. – Johannes Köster Jun 15 '21 at 10:36
0

I also use the rm -rf approach, but if a tool fails in the middle of a run but can restart where it left off (e.g. cluster time limit & CellRanger) then you end up wasting a lot of computation by deleting the directory. Meanwhile CellRanger needs to create the directory itself or else it will not run. the touch option in snakemake can be used, but then you cannot easily refer to Cellranger outputs as inputs for other rules