4

I am trying to run a simple one-rule snakemake file as following:

resources_dir='resources'

rule downloadReference:
    output:
        fa = resources_dir+'/human_g1k_v37.fasta',
        fai = resources_dir+'/human_g1k_v37.fasta.fai',
    shell:
        ('mkdir -p '+resources_dir+'; cd '+resources_dir+'; ' +
        'wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz; gunzip human_g1k_v37.fasta.gz; ' +
        'wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.fai;')

But I get an error as :

    Error in job downloadReference while creating output files 
    resources/human_g1k_v37.fasta, resources/human_g1k_v37.fasta.fai.
    RuleException:
    CalledProcessError in line 10 of 
    /lustre4/home/masih/projects/NGS_pipeline/snake_test:
    Command 'mkdir -p resources; cd resources; wget ftp://ftp-
  trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz; gunzip human_g1k_v37.fasta.gz; wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.fai;' returned non-zero exit status 2.
      File "/lustre4/home/masih/projects/NGS_pipeline/snake_test", line 10, in __rule_downloadReference
      File "/home/masih/miniconda3/lib/python3.6/concurrent/futures/thread.py", line 55, in run
    Removing output files of failed job downloadReference since they might be corrupted:
    resources/human_g1k_v37.fasta
    Will exit after finishing currently running jobs.
    Exiting because a job execution failed. Look above for error message

I am not using the threads option in snakemake. I can not figure out how this is related with thread.py. Anybody has experience with this error?

bli
  • 7,549
  • 7
  • 48
  • 94
Masih
  • 920
  • 2
  • 19
  • 36
  • 1
    To debug this, I would advise you to add this to your snakefile: http://paste.ubuntu.com/24898100/ Then you can append `|| error_exit "some error message"` to each individual shell command, in order to know at what step the failure occurs. – bli Jun 19 '17 at 08:33
  • 1
    Seems like it's the gunzip command that fails. Can't figure out why though. It throws a warning "gzip: human_g1k_v37.fasta.gz: decompression OK, trailing garbage ignored". However, gunzip works fine in commandline outside Snakemake... – rioualen Jun 19 '17 at 10:43
  • @rioulaen I get the same error. It is weird that the gunzip works fine outside snakemake but throws error while running from snakemake! – Masih Jun 20 '17 at 13:19
  • So I was mistaken with my ineffective `cd` hypothesis... – bli Jun 20 '17 at 14:26

1 Answers1

4

When a shell command fails, it has an exit status which is not 0. This is what "returned non-zero exit status 2" indicates.

One of your shell command fails, and the failure is propagated to snakemake. I suppose that snakemake uses threads and that the failure manifests itself at the level of some code in the threads.py file1.

In order to better understand what is happening, we can capture the first error using the || operator followed by a function issuing an error message:

# Define functions to be used in shell portions
shell.prefix("""
# http://linuxcommand.org/wss0150.php
PROGNAME=$(basename $0)

function error_exit
{{
#   ----------------------------------------------------------------
#   Function for exit due to fatal program error
#       Accepts 1 argument:
#           string containing descriptive error message
#   ----------------------------------------------------------------
    echo "${{PROGNAME}}: ${{1:-"Unknown Error"}}" 1>&2
    exit 1
}}
""")

resources_dir='resources'

rule downloadReference:
    output:
        fa = resources_dir+'/human_g1k_v37.fasta',
        fai = resources_dir+'/human_g1k_v37.fasta.fai',
    params:
        resources_dir = resources_dir
    shell:
        """
        mkdir -p {params.resources_dir}
        cd {params.resources_dir}
        wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz || error_exit "fasta download failed"
        gunzip human_g1k_v37.fasta.gz || error_exit "fasta gunzip failed"
        wget ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.fai || error_exit "fai download failed"
        """

When I run this, I get the following message after the messages of the first download:

gzip: human_g1k_v37.fasta.gz: decompression OK, trailing garbage ignored
bash: fasta gunzip failed

It turns out that gzip uses a non-zero exit code in case of warnings:

Exit status is normally 0; if an error occurs, exit status is 1. If a warning occurs, exit status is 2.

(from the DIAGNOSTICS section of man gzip)

If I remove the error-capturing || error_exit "fasta gunzip failed", the workflow is able to complete. So I don't understand why you had this error in the first place.

I'm surprised that gzip authors decided to use a non-zero status in case of a simple warning. They added a -q option to turn off this specific warning, due to the presence of trailing zeroes, but strangely, the exit status is still non-zero when this option is used.


1 According to Johannes Köster, author of snakemake:

Sorry for the misleading thread.py thing, this is just the place where snakemake detects the problem. The real issue is that your command exits with exit code 2, which indicates an error not related to Snakemake

bli
  • 7,549
  • 7
  • 48
  • 94
  • error_exit function do not make any changes in the output of this code. Where does it write the report of failure? It is not in the same directory where the snakefile exists. – Masih Jun 20 '17 at 13:24
  • The `error_exit` function enables the displaying of an useful error message when one of the steps behind which it is appended fails, and exit the shell without trying to execute the next steps. The error message should appear along with the rest of things displayed by snakemake, but it doesn't go to a file. – bli Jun 20 '17 at 14:21
  • Actually the problem is with the links, changing the links this code works just fine – Masih Jun 22 '17 at 12:54