I am trying to align RNA-seq data using Salmon, however because I work with a variety of mouse strains the genome index required will depend on the strain of mouse. I keep this information in the file name so the idea is that if the input file has a specific name then a certain index should be used, otherwise another index should be used.
This script runs perfectly and returns all the files I need, however, it will always use the shell command in the else statement, and completely ignores the if statement despite the file names technically being identical.
I've tried various things and have been unsuccessful, so I would also like to know the best way (not necessarily the most generalizable since this pipeline is only for my use) to accomplish what I would like.
Code:
rule salmon_mapping:
input:
fastq="data/raw_data/{sample}.fastq.gz",
c57_index="/gpfs/data01/glasslab/home/cag104/refs/Mus_musculus/Ensemble/C57/Salmon_Index/",
aj_index="/gpfs/data01/glasslab/home/cag104/refs/Mus_musculus/Ensemble/AJ/Salmon_Index/"
log:
"logs/salmon/{sample}_quant.log"
params:
prefix="{sample}"
output:
"data/processed_data/{sample}/quant.sf",
"data/processed_data/{sample}/cmd_info.json"
run:
if {input.fastq} == "data/raw_data/mouse_AJ_M_Liver_Hepatocyte_RNA_notx_CG_1_VML_s20180131_GTCCGC.fastq.gz" or {input.fastq} == "data/raw_data/mouse_AJ_M_Liver_Hepatocyte_RNA_notx_CG_2_VML_s20180131_GTGAAA.fastq.gz":
shell("salmon quant -p 16 -i {input.aj_index} -l A -r <(gunzip -c {input.fastq}) -o data/processed_data/{wildcards.sample} &> {log}")
else:
shell("salmon quant -p 16 -i {input.c57_index} -l A -r <(gunzip -c {input.fastq}) -o data/processed_data/{wildcards.sample} &> {log}"