4

I currently have a snakemake workflow that requires the use of lambda wildcards, set up as follows:

Snakefile:

configfile: "config.yaml"
workdir: config["work"]

rule all:
    input:
        expand("logs/bwa/{ref}.log", ref=config["refs"])

rule bwa_index:
    input:
        lambda wildcards: 'data/'+config["refs"][wildcards.ref]+".fna.gz"
    output:
        "logs/bwa/{ref}.log"
    log:
        "logs/bwa/{ref}.log"
    shell:
        "bwa index {input} 2&>1 {log}"

Config file:

work: /datasets/work/AF_CROWN_RUST_WORK/2020-02-28_GWAS

refs:
    12NC29: GCA_002873275.1_ASM287327v1_genomic
    12SD80: GCA_002873125.1_ASM287312v1_genomic

This works, but I've had to use a hack to get the output of bwa_index to play with the input of all. My hack is to generate a log file as part of bwa_index, set the log to the output of bwa_index, and then set the input of all to these log files. As I said, it works, but I don't like it. The problem is that the true outputs of bwa_index are of the format, for example, GCA_002873275.1_ASM287327v1_genomic.fna.sa. So, to specify these output files, I would need to use a lambda function for the output, something like:

rule bwa_index:
    input:
        lambda wildcards: 'data/'+config["refs"][wildcards.ref]+".fna.gz"
    output:
        lambda wildcards: 'data/'+config["refs"][wildcards.ref]+".fna.sa"
    log:
        "logs/bwa/{ref}.log"
    shell:
        "bwa index {input} 2&>1 {log}"

and then use a lambda function with expand for the input of rule all. However, snakemake will not accept functions as output, so I'm at a complete loss how to do this (other than my hack). Does anyone have suggestions of a sensible solution? TIA!

Ensa
  • 105
  • 1
  • 5

1 Answers1

8

You can use a simple python function in the inputs (as the lambda function) so I suggest you use it for the rule all.

configfile: "config.yaml"
workdir: config["work"]

def getTargetFiles():
    targets = list()
    for r in config["refs"]:
        targets.append("data/"+config["refs"][r]+".fna.sa")

    return targets

rule all:
    input:
        getTargetFiles()

rule bwa_index:
    input:
        "data/{ref}.fna.gz"
    output:
        "data/{ref}.fna.sa"
    log:
        "logs/bwa/{ref}.log"
    shell:
        "bwa index {input} 2&>1 {log}"

Careful here the wildcard {ref} is the value and not the key of your dictionnary so your log files will finally be named "logs/bwa/GCA_002873275.1_ASM287327v1_genomic.log", etc...

Eric C.
  • 3,310
  • 2
  • 22
  • 29
  • Thank you, that works a treat. However, I realise I don't quite understand how snakemake maps wildcards to values. Why is `{ref}` the value and not the key in `bwa_index`? How would you access the key if needed? Thanks again for your help. – Ensa May 20 '20 at 23:52
  • 1
    @Ensa a wildcard is basically anything. Snakemake works by trying to make the files you ask for in rule all. So you must define concrete names in this rule. In the rest of the rule, you can use wildcards to replace any string. I said that `{ref}` was the value but it's only because I asked for the values in rule all. `{ref}` is a wildcard that can be replaced by anything. – Eric C. May 21 '20 at 19:02