0

I am trying to align RNA-seq data using Salmon, however because I work with a variety of mouse strains the genome index required will depend on the strain of mouse. I keep this information in the file name so the idea is that if the input file has a specific name then a certain index should be used, otherwise another index should be used.

This script runs perfectly and returns all the files I need, however, it will always use the shell command in the else statement, and completely ignores the if statement despite the file names technically being identical.

I've tried various things and have been unsuccessful, so I would also like to know the best way (not necessarily the most generalizable since this pipeline is only for my use) to accomplish what I would like.

Code:

rule salmon_mapping:
        input:
                fastq="data/raw_data/{sample}.fastq.gz",
                c57_index="/gpfs/data01/glasslab/home/cag104/refs/Mus_musculus/Ensemble/C57/Salmon_Index/",
                aj_index="/gpfs/data01/glasslab/home/cag104/refs/Mus_musculus/Ensemble/AJ/Salmon_Index/"
        log:
                "logs/salmon/{sample}_quant.log"
        params:
                prefix="{sample}"
        output:
                "data/processed_data/{sample}/quant.sf",
                "data/processed_data/{sample}/cmd_info.json"
        run:
                if {input.fastq} == "data/raw_data/mouse_AJ_M_Liver_Hepatocyte_RNA_notx_CG_1_VML_s20180131_GTCCGC.fastq.gz" or {input.fastq} == "data/raw_data/mouse_AJ_M_Liver_Hepatocyte_RNA_notx_CG_2_VML_s20180131_GTGAAA.fastq.gz":
                        shell("salmon quant -p 16 -i {input.aj_index} -l A -r <(gunzip -c {input.fastq}) -o data/processed_data/{wildcards.sample} &> {log}")
                else:
                        shell("salmon quant -p 16 -i {input.c57_index} -l A -r <(gunzip -c {input.fastq}) -o data/processed_data/{wildcards.sample} &> {log}"
dddxxx
  • 349
  • 2
  • 12

2 Answers2

1

Replace {input.fastq} in if statement, but not in shell statement, with input.fastq. In your example, input.fastq is a string, which is what you want, whereas {input.fastq} makes it a python set object.

Manavalan Gajapathy
  • 3,900
  • 2
  • 20
  • 43
  • Let me go ahead and give this a shot. I read somewhere about the {} being output as a set object and had previously attempted to do str({input.fastq}) as a work-around and that hadn't worked. Unfortunately the documentation isn't very clear about this. Will report back and give you the check mark! Thanks! – dddxxx Mar 10 '18 at 22:20
  • I would use the wildcard object instead of input objet `if wildcards.sample == "mouse_AJ_M_Liver_Hepatocyte_RNA_notx_CG_1_VML_s20180131_GTCCGC" or ...` – Eric C. Mar 12 '18 at 10:47
1

Use a function that defines the input to your rule based on your wildcards.

def select_fastq_and_index(wildcards):
    fq = "data/raw_data/{sample}.fastq.gz".format(sample = wildcards["sample"])
    idx = ""
    if wildcards["sample"] in [             
    "mouse_AJ_M_Liver_Hepatocyte_RNA_notx_CG_1_VML_s20180131_GTCCGC",
    "mouse_AJ_M_Liver_Hepatocyte_RNA_notx_CG_2_VML_s20180131_GTGAAA"
        ]:
            idx = foo
    else:
            idx = bar
    return fq, idx

Then rewrite your rule:

rule salmon_mapping:
        input:
                select_fastq_and_index
        log:
                "logs/salmon/{sample}_quant.log"
        params:
                prefix="{sample}"
        output:
                "data/processed_data/{sample}/quant.sf",
                "data/processed_data/{sample}/cmd_info.json"
        shell: """
            salmon quant -p 16 -i {input[1]} -l A \
            -r <(gunzip -c {input[0]}) \
            -o data/processed_data/{wildcards.sample} \
            &> {log}
            """
Russ Hyde
  • 2,154
  • 12
  • 21