2

I am trying to run a snakemake file but it is producing a weird result

refseq = 'refseq.fasta'
reads = '_R1_001'
reads2 = '_R2_001'
configfile: "config.yaml"
## Add config

def getsamples():
    import glob
    test = (glob.glob("*.fastq.gz"))
    samples = []
    for i in test:
        samples.append(i.rsplit('_', 2)[0])
    print(samples)
    return(samples)

def getbarcodes():
    with open('unique.barcodes.txt') as file:
        lines = [line.rstrip() for line in file]
    return(lines)

rule all:
    input:
        expand("called/{barcodes}{sample}_called.vcf", barcodes=getbarcodes(), sample=getsamples()),
        expand("mosdepth/{barcodes}{sample}.mosdepth.summary.txt", barcodes=getbarcodes(), sample=getsamples())


rule fastq_grep:
    input:
        R1 = "{sample}_R1_001.fastq.gz",
        R2 = "{sample}_R2_001.fastq.gz"
    output:
        "grepped/{barcodes}{sample}_R1_001.plate.fastq",
        "grepped/{barcodes}{sample}_R2_001.plate.fastq"
    shell:
        "fastq-grep -i '{wildcards.barcodes}' {input.R1} > {output} && fastq-grep -i '{wildcards.barcodes}' {input.R2} > {output}"



I have files in my directory with *.fastq.gz on the end of them but I get this:

Missing input files for rule fastq_grep: 0_R1_001.fastq.gz 0_R2_001.fastq.gz

Those two files do not exist, where is it getting them from?

I would expect to see a lot of fastq files that are in my directory but it is only listing one file that does not exist.

  • 1
    Fixed: Like Sultan said it's a wildcard constraint problem. So I was able to address it by constraining my first wildcard to just characters with [a-z-A-Z]+$ – LucasCortes Dec 02 '22 at 16:30

1 Answers1

1

It's a common problem due to {barcodes}{sample} pattern.

Snakemake won't know where {barcodes} ends and where {sample} starts without a wildcard_constraint. Right now, snakemake is thinking that your sample wildcard is just a 0.

SultanOrazbayev
  • 14,900
  • 3
  • 16
  • 46