0

I have 120 fastq files which have been sequenced in different locations, and I need to trim the reads in each file down to a specific length. The final length required for the reads in each file differs and this depends on the location it where it was sequenced.

In snakemake, I'm trying to write an IF statement that will run 3 alternate shell commands depending on the fastq file sequencing location. To do this I'm trying to get a PARTIAL string match of the {sample} wildcard. The Snakefile below runs but always resolves to else. If I use the entire input string it resolves correctly. I'm not sure how to write the IF statement to get the partial match I need.

This question explains how to solve the full input name match, but not a partial sub-string match.

SAMPLES = ["1111_Crdf", "2222_Extr", "3333_Edin"]

rule all:
    input:
    expand("hard_trimmed/{sample}_R2.fastq", sample = SAMPLES)

rule hard_trim_fastq:
    input:   r1 = "qc_trimmed/{sample}_R1.fastq",
             r2 = "qc_trimmed/{sample}_R2.fastq"
    output:  "hard_trimmed/{sample}_R2.fastq"
    params: crdf = 55, extr = 80, edin = 105
    run:
        if "Crdf" in wildcards.sample:
            shell("echo {input} {output} {params.crdf}")
        elif "Extr" in wildcards.sample:
            shell("echo {input} {output} {params.extr}")
        else:
            shell("echo {input} {output} {params.extr}")

Any advice/comments would be greatly appreciated.

Darren
  • 277
  • 4
  • 17

1 Answers1

1

Please check the last two shell(...) expressions, they are exactly the same:

           shell("echo {input} {output} {params.extr}")

If you fix that issue, the indentation problems, and if you provide the rule that produces output, your pipeline should work. Here is my modified snippet that works as expected:

SAMPLES = ["1111_Crdf", "2222_Extr", "3333_Edin"]

rule all:
    input:
        expand("hard_trimmed/{sample}_R2.fastq", sample = SAMPLES)

rule hard_trim_fastq:
    input:
        r1 = "qc_trimmed/{sample}_R1.fastq",
        r2 = "qc_trimmed/{sample}_R2.fastq"
    output:  "hard_trimmed/{sample}_R2.fastq"
    params:
        crdf = 55,
        extr = 80,
        edin = 105
    run:
        if "Crdf" in wildcards.sample:
            shell("echo {input} {params.crdf} > {output}")
        elif "Extr" in wildcards.sample:
            shell("echo {input} {params.extr} > {output}")
        else:
            shell("echo {input} {params.edin} > {output}")
Dmitry Kuzminov
  • 6,180
  • 6
  • 18
  • 40
  • Many Thanks. Had to be something simple - it was making no sense! Probs too long in front of the screen! – Darren Jul 07 '20 at 10:15