snakemake: pipeline failed with MissingOutputException

Question

To make it clear I have to modify the post. The situation is that at the beginning I ran the pipeline well on a local machine but failed when submitted to cluster. After posting the question, I found the version of snakemake was 3.13.3, so I updated to v5.7.3, and then I found it failed on both of the local machine and cluster. Thus, I am now struggling to figure what' wrong with my Snakefile or anything else. The error message:

Waiting at most 5 seconds for missing files.
MissingOutputException in line 24 of /work/path/rna_seq_pipeline/Snakefile:
Missing files after 5 seconds:
bam/A2_Aligned.toTranscriptome.out.bam
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /work/path/rna_seq_pipeline/.snakemake/log/2019-11-07T153434.327966.snakemake.log

So maybe something wrong with my snakamake file. Here is my Snakefile:

# config file 
configfile: "config.yaml"

shell.prefix("source ~/.bash_profile")

# determine which genome reference you would like to use
# here we are using GRCm38
# depending on the freeze, the appropriate references and data files will be chosen from the config
freeze = config['freeze']

# read list of samples, one per line
with open(config['samples']) as f:
    SAMPLES = f.read().splitlines()

rule all:
    input:
        starindex = config['reference']['stargenomedir'][freeze] + "/" + "SAindex",
        rsemindex = config['reference']['rsemgenomedir'][freeze] + ".n2g.idx.fa",
        fastqs = expand("fastq/{file}_{rep}_paired.fq.gz", file = SAMPLES, rep = ['1','2']),
        bams = expand("bam/{file}_Aligned.toTranscriptome.out.bam", file = SAMPLES),
        quant = expand("quant/{file}.genes.results", file = SAMPLES)

# align using STAR
rule star_align:
    input:
        f1 = "fastq/" + "{file}_1_paired.fq.gz",
        f2 = "fastq/" + "{file}_2_paired.fq.gz"
    output:
        out = "bam/" + "{file}_Aligned.toTranscriptome.out.bam"
    params:
        star = config['tools']['star'],
        genomedir = config['reference']['stargenomedir'][freeze],
        prefix = "bam/" + "{file}_"
    threads: 12
    shell:  
        """
        {params.star} \
        --runThreadN {threads} \
        --genomeDir {params.genomedir} \
        --readFilesIn {input.f1} {input.f2} \
        --readFilesCommand zcat \
        --outFileNamePrefix {params.prefix} \
        --outSAMtype BAM SortedByCoordinate \
        --outSAMunmapped Within \
        --quantMode TranscriptomeSAM \
        --outSAMattributes NH HI AS NM MD \
        --outFilterType BySJout \
        --outFilterMultimapNmax 20 \
        --outFilterMismatchNmax 999 \
        --outFilterMismatchNoverReadLmax 0.04 \
        --alignIntronMin 20 \
        --alignIntronMax 1000000 \
        --alignMatesGapMax 1000000 \
        --alignSJoverhangMin 8 \
        --alignSJDBoverhangMin 1 \
        --sjdbScore 1 \
        --limitBAMsortRAM 50000000000
        """

# quantify expression using RSEM
rule rsem_quant:
    input:
        bam = "bam/" + "{file}_Aligned.toTranscriptome.out.bam"
    output:
        quant = "quant/" + "{file}.genes.results"
    params:
        calcexp = config['tools']['rsem']['calcexp'],
        genomedir = config['reference']['rsemgenomedir'][freeze],
        prefix =  "quant/" + "{file}"
    threads: 12
    shell:
        """
        {params.calcexp} \
        --paired-end \
        --no-bam-output \
        --quiet \
        --no-qualities \
        -p {threads} \
        --forward-prob 0.5 \
        --seed-length 21 \
        --fragment-length-mean -1.0 \
        --bam {input.bam} {params.genomedir} {params.prefix}

And my config.yaml:

freeze: grcm38

# samples file
samples:
    samples.txt

# software, binaries or tools
tools:
    fastqdump: fastq-dump
star: STAR
rsem: 
    calcexp: rsem-calculate-expression
    prepref: rsem-prepare-reference

# reference files, genome indices and data
reference:
    stargenomedir: 
        grch38: /work/path/reference/STAR/GRCh38
        grcm38: /work/path/reference/STAR/GRCm38
    rsemgenomedir: 
        grch38: /work/path/reference/RSEM/GRCh38/GRCh38
        grcm38: /work/path/reference/RSEM/GRCm38/GRCm38
    fasta: 
        grch38: /work/path/GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa
        grcm38: /work/path/reference/GRCm38/Mus_musculus.GRCm38.dna.primary_assembly.fa
    gtf: 
        grch38: /work/path/reference/GRCh38/Homo_sapiens.GRCh38.98.gtf
        grcm38: /work/path/reference/GRCm38/Mus_musculus.GRCm38.98.gtf

And finally, samples.txt:

A1
A2

Any suggestion?

ps: adapted from the pipeline https://github.com/komalsrathi/rnaseq-star-rsem-pipeline/blob/master/Snakefile

Difficult to say what goes wrong. Try executing the command which is printed when the rule fails. Are you sure that the output file gets the correct name? It looks like the prefix is not the same as the output name. — Maarten-vd-Sande, Nov 07 '19 at 09:20
Sorry, actually I want to use `/work/path/...` to hide my personal information, but forgot to modify it in the error message. That's not the problem. I used `snakemake -np` to create shell command and run it manually. The command runs successfully except I need to create directory `bam/` and `quant/` manually, but I think snakemake can automatically mkdir, isn't it? — Dianyu Chen, Nov 07 '19 at 10:07

Maarten-vd-Sande · Answer 1 · 2019-11-06T15:05:14.110

0

When touch complains cannot touch: : No such file or directory, this generally means that the directory structure does not exist. What happens if you try:

touch /work/path/rna_seq_pipeline/.snakemake/tmp.o_2ffebs/1.jobfailed

The missing files line indicates that you are trying to store output in the .snakemake folder. Is that true? What happens if you move this to a for instance the current working directory?

edited Nov 06 '19 at 15:05

answered Nov 06 '19 at 14:59

Maarten-vd-Sande

3,413
10
27

I can not create file by `touch /work/path/rna_seq_pipeline/.snakemake/tmp.o_2ffebs/1.jobfailed` because directory `tmp.o_2ffebs` does not exist (`.snakemake` exists), so it seems that submitting snakemake to cluster does not create a temp directory successfully or before that the workflow has already move on – Dianyu Chen Nov 07 '19 at 01:10
Do you get a log of the rule? Why is the job failing? Is it qsub related? – Maarten-vd-Sande Nov 07 '19 at 05:41
I found I used snakemake v3.13.3 before, and after updating snakemake to 5.7.4, running it on the local machine also failed. I have updated my original post – Dianyu Chen Nov 07 '19 at 08:46

score 0 · Accepted Answer · answered Nov 08 '19 at 04:55

0

Same post on Biostars and also answer there. https://www.biostars.org/p/406693/#406907

answered Nov 08 '19 at 04:55

Dianyu Chen

1
2

This answer doesn't work for SLURM systems. – James Dalgleish May 22 '22 at 01:31

snakemake: pipeline failed with MissingOutputException

2 Answers2