snakemake - do not delete output of failed rules

Question

I have a snakemake workflow containing a rule that runs another "inner" snakemake workflow.
Sometimes a certain rule of the inner workflow fails, which means the inner workflow fails. As a result, all files listed under the output of the inner workflow are deleted by the outer workflow, even if the rules of the inner workflow that created them completed successfully.
Is there a way to prevent snakemake from deleting the outputs of failed rules? Or maybe you can suggest another workaround?
A few notes:

The outputs of the inner workflow must be listed, b/c they are used as input for other rules in the outer workflow.
I tried setting the outputs of the inner workflow as protected, but this didn't help.
I've also tried adding exit 0 to the end of the call to the inner workflow to make snakemake think it completed successfully,

like this:

rule run_inner:
    input:
        inputs...
    output:
        outputs...
    shell:
        """
        snakemake -s inner.snakefile
        exit 0
        """

but outputs are still deleted.
Would appreciate any help. Thanks!

Related: https://stackoverflow.com/questions/55419603 – sappjw Jan 04 '23 at 19:27 — sappjw, Jan 04 '23 at 19:27

score 4 · Answer 1 · answered Jan 04 '23 at 13:41

4

You can use the --keep-incomplete option to snakemake, either on the command line or via a profile. This will prevent removal of incomplete output files by failed jobs.

answered Jan 04 '23 at 13:41

sappjw

373
1
14

score 1 · Accepted Answer · answered Dec 23 '20 at 13:28

One option may be to have run_inner produce a dummy output file that flags the completion of the rule. Rules following run_inner will take in input the dummy file. E.g:

rule run_inner:
    ...
    output:
        # or just 'run_inner.done' if wildcards are not involved
        touch('{sample}.run_inner.done'), 
    shell:
        'snakemake -s inner.snakefile'

run next:
    input:
        '{sample}.run_inner.done',
    params:
        real_input= '{sample}.data.txt', # This is what run_inner actually produces
    shell:
        'do stuff {params.real_input}'

If snakemake -s inner.snakefile fails, the dummy output will be deleted but snakemake -s inner.snakefile will restart from where it left.

Another opition could be to integrate the rules in inner.snakefile into your outer pipeline using e.g. the include statement. I feel this option is preferable but, of course, it would be more complicated to implement.

Thanks! `include` was exactly what I needed and is actually pretty straightforward to use. — soungalo, Dec 23 '20 at 16:42

Dmitry Kuzminov · Answer 3 · 2020-12-23T17:51:02.563

0

One workaround is to use run instead of shell:

rule run_inner:
    input:
        inputs...
    output:
        outputs...
    run:
        shell("""snakemake -s inner.snakefile""")
        # Add your code here to store the files before removing

Even if the script in the shell function call fails, the files still exist until the script in the run section finishes. You may copy the files in a safe place there.

Update: You need to handle exceptions to continue execution whenever the script returns error. The script below illustrates the idea: the print function from the except: block prints True, the other from onerror prints False

rule run_inner:
    output:
        "output.txt"
    run:
        try:
            shell("""touch output.txt; exit 1""")
        except:
            print(os.path.exists("output.txt"))

onerror:
    print(os.path.exists("output.txt"))

edited Dec 23 '20 at 17:51

answered Dec 23 '20 at 04:54

Dmitry Kuzminov

6,180
6
18
40

Thanks, that's an interesting direction, but how would I get the output files back to their expected path this way? I'll need them there in order for the workflow to resume. – soungalo Dec 23 '20 at 11:19
I don't think this is going to work. If the `shell` command fails, snakemake will delete the files listed in `output` and exit. It will not go on and execute the rest of the `run` directive. – dariober Dec 23 '20 at 13:18
@dariober, don't think, just test. That is not the `shell()` function that removes the outputs, but the cleanup procedure of the rule. This procedure is not called until the `run` script finishes. – Dmitry Kuzminov Dec 23 '20 at 16:58
@DmitryKuzminov I tested it I still don;t think the run script runs to the end. In the run directory I put `shell("touch {output}; exit 1"); os.rename(output[0], output[0] + '.done')`. `shell` creates the output and fails so `os.rename` is never executed and `output[0] + '.done'` is not created. Can you post a repoducible example? – dariober Dec 23 '20 at 17:19
@dariober, hm, my initial script was incorrect. After wrapping the `shell` into try/except that starts working. – Dmitry Kuzminov Dec 23 '20 at 17:52

liagy · Answer 4 · 2021-04-17T03:50:02.907

The program "fails" when throws back a non-zero return value. Therefore we need to only "fix" this issue to trick the inner shell thinking that all programs have successfully finished. The easiest way is to use some error command || true. Below is a minimal example:

rule test:
    output:
        "test.output",
    shell:
        """
        touch test.output
        # below cat will trigger error 
        cat file_not_exist || true
        """

You'll find that despite the error thrown by cat, test.output still survives.

snakemake - do not delete output of failed rules

4 Answers4