I have 120 fastq files which have been sequenced in different locations, and I need to trim the reads in each file down to a specific length. The final length required for the reads in each file differs and this depends on the location it where it was sequenced.
In snakemake, I'm trying to write an IF statement that will run 3 alternate shell commands depending on the fastq file sequencing location. To do this I'm trying to get a PARTIAL string match of the {sample}
wildcard. The Snakefile below runs but always resolves to else. If I use the entire input string it resolves correctly. I'm not sure how to write the IF statement to get the partial match I need.
This question explains how to solve the full input name match, but not a partial sub-string match.
SAMPLES = ["1111_Crdf", "2222_Extr", "3333_Edin"]
rule all:
input:
expand("hard_trimmed/{sample}_R2.fastq", sample = SAMPLES)
rule hard_trim_fastq:
input: r1 = "qc_trimmed/{sample}_R1.fastq",
r2 = "qc_trimmed/{sample}_R2.fastq"
output: "hard_trimmed/{sample}_R2.fastq"
params: crdf = 55, extr = 80, edin = 105
run:
if "Crdf" in wildcards.sample:
shell("echo {input} {output} {params.crdf}")
elif "Extr" in wildcards.sample:
shell("echo {input} {output} {params.extr}")
else:
shell("echo {input} {output} {params.extr}")
Any advice/comments would be greatly appreciated.