4

I'm having some trouble with a snakemake workflow I developed. For a specific rule, the output is sometimes identified as incomplete by snakemake:

IncompleteFilesException:
The files below seem to be incomplete. If you are sure that certain files are not incomplete, mark them as complete with

    snakemake --cleanup-metadata <filenames>

To re-generate the files rerun your command with the --rerun-incomplete flag.
Incomplete files:

This rule runs several times (with different wildcard values) and only some fail with this error. Interestingly, if I rerun the workflow from scratch, the same jobs will complete with no error and other ones might produce it. Also, I manually checked the output and don't see anything wrong with it. I can resume the workflow with no problem.
I am aware of the --ignore-incomplete workaround, but still curious as to why this might happen? How does snakemake decide about an output being incomplete? I should also mention that the jobs run on a PBS HPC system - not sure if it's related.

Cornelius Roemer
  • 3,772
  • 1
  • 24
  • 55
soungalo
  • 1,106
  • 2
  • 19
  • 34

1 Answers1

4

Incomplete in this context probably means, that the job did not finish how it should have been, so Snakemake cannot guarantee the output is how it should be. If your rule produces output but then fails, Snakemake would still mark the output as incomplete.

I looked up in the source code when the IncompleteFilesException is raised. Snakemake seems to mark files as complete when persistence.finished() is called, see code here.

And finished() is called by postprocess() which again gets called by a number of places. Without knowing Snakemake inside out, it seems hard to know where the problem lies. Somehow, Snakemake must think that the job didn't complete properly.

I would look into the logs of the Snakemake runs. Possibly some of the jobs fail.

If you don't want to dig into what went wrong, you can simply pass the extra CLI option --ri (short for --rerun-incomplete) and it will rerun the rules for which snakemake thinks something didn't work out right in a previous run. This is safer than using --ignore-incomplete, as --ignore-incomplete will treat potentially faulty output as correct.

Cornelius Roemer
  • 3,772
  • 1
  • 24
  • 55