1

I am trying to construct a snakemake pipeline for biosynthetic gene cluter detection but am struggling with the error:

Missing input files for rule all:
antismash-output/Unmap_09/Unmap_09.txt
antismash-output/Unmap_12/Unmap_12.txt
antismash-output/Unmap_18/Unmap_18.txt

And so on with more files. As far as I can see the file generation in the snakefile should be working:

    workdir: config["path_to_files"]
wildcard_constraints:
    separator = config["separator"],
    extension = config["file_extension"],
    sample = config["samples"]

rule all:
    input:
        expand("antismash-output/{sample}/{sample}.txt", sample = config["samples"])

# merging the paired end reads (either fasta or fastq) as prodigal only takes single end reads
rule pear:
    input:
        forward = "{sample}{separator}1.{extension}",
        reverse = "{sample}{separator}2.{extension}"

    output:
        "merged_reads/{sample}.{extension}"

    conda:
        "~/miniconda3/envs/antismash"

    shell:
        "pear -f {input.forward} -r {input.reverse} -o {output} -t 21"

# If single end then move them to merged_reads directory
rule move:
    input:
        "{sample}.{extension}"

    output:
        "merged_reads/{sample}.{extension}"

    shell:
        "cp {path}/{sample}.{extension} {path}/merged_reads/"

# Setting the rule order on the 2 above rules which should be treated equally and only one run.
ruleorder: pear > move
# annotating the metagenome with prodigal#. Can be done inside antiSMASH but prefer to do it out
rule prodigal:
    input:
        "merged_reads/{sample}.{extension}"

    output:
        gbk_files = "annotated_reads/{sample}.gbk",
        protein_files = "protein_reads/{sample}.faa"

    conda:
        "~/miniconda3/envs/antismash"

    shell:
        "prodigal -i {input} -o {output.gbk_files} -a {output.protein_files} -p meta"

# running antiSMASH on the annotated metagenome
rule antiSMASH:
    input:
        "annotated_reads/{sample}.gbk"

    output:
        touch("antismash-output/{sample}/{sample}.txt")

    conda:
        "~/miniconda3/envs/antismash"

    shell:
        "antismash --knownclusterblast --subclusterblast --full-hmmer --smcog --outputfolder antismash-output/{wildcards.sample}/ {input}"

This is an example of what my config.yaml file looks like:

file_extension: fastq
path_to_files: /home/lamma/ABR/Each_reads
samples:
- Unmap_14
- Unmap_55
- Unmap_37
separator: _

I can not see where i am going wrong within the snakefile to produce such an error. Apologies for the simple question, I am new to snakemake.

Lamma
  • 895
  • 1
  • 12
  • 26

2 Answers2

2

The problem is that you setup your global wildcard constraints wrong:

wildcard_constraints:
    separator = config["separator"],
    extension = config["file_extension"],
    sample = '|'.join(config["samples"])  # <-- this should fix the problem

Then immediatly another problem follows with extension and seperator wildcards. Snakemake can only infer what these should be from other filenames, you can not actually set these through wildcard constraints. We can make use of f-string syntax to fill in what the values should be:

rule pear:
    input:
        forward = f"{{sample}}{config['separator']}1.{{extension}}",
        reverse = f"{{sample}}{config['separator']}2.{{extension}}"
    ...

and:

rule prodigal:
    input:
        f"merged_reads/{{sample}}.{config['file_extension']}"
    ...

Take a look at snakemake regex if the wildcard constraints confuse you, and find a blog about f-strings if you are confused about the f"" syntax and when to use single { and when to use double {{ to escape them.

Hope that helps!

Maarten-vd-Sande
  • 3,413
  • 10
  • 27
  • You sir are a damn genius! Thatnk you very much!! I will take a look ta the recourses you suggesting so I can learn for next time :) – Lamma Nov 28 '19 at 14:59
  • That constraint is an amazing trick. I've been looking for something like that for a long time! – vtrubets Feb 20 '20 at 13:39
0

(Since I can't comment yet ...) You might have a problem with your relative paths, and we cannot see where your files actually are found.

A way to debug this is to use config["path_to_files"] to create absolute paths in input: That would give you better error message on where Snakemake expects the files - input/output files are relative to the working directory.

gutorm
  • 99
  • 6