Snakemake › Access multiple keys from config file

Question

I have the question about the proper handling of the config file. I'm trying to solve my issue for a couple of days now but with the best will, I just can't find out how to do it. I know that this question is maybe quite similar with all the others here and I really tried to use them - but I didn't really get it. I hope that some things about how snakemake works will be more clear when I solved this problem.

I'm just switching to snakemake and I thought I just can easily convert my bash script. To get familiar with snakemake I started trying a simple Data-Processing pipeline. I know I could solve my case while defining every variable within the snakefile. But I want to use an external config file. First is to say, for better understanding I decided just to post the code which I thought would work somehow. I already played around with different versions for a "rule all" and the "lambda" functions, but nothing worked so far and it just would be confusing. I'm really a bit embarrassed and confused about why I can't get this working. The variable differs from the key because I aways had a version where I redefine the variable, like: $ sample=config["samples"]

I would be incredibly thankful for an example code.

What I'd like to have is:

    The config file:
    samples:
    - SRX1232390
    - SRX2312380
    names:
    - SomeData
    - SomeControl
    adapters:
    - GATCGTAGC
    - GATCAGTCG

And then I thought I can just call the keys like different variables.

     rule download_fastq:  
        output:
            "fastq/{name}.fastq.gz"
        shell:
            "fastq-dump {wildcards.sample} > {output}"

later there will be more rules, so I thought for them I also just need a key:

       rule trimming_cutadapt: 
           input:
              "fastq/{name}.fastq"
           output:
             "ctadpt_{name}.fastq"
           shell:
            "cutadapt -a {adapt}"

I also tried something with a config file like this: samples:

    Somedata: SRX1232131
    SomeControl: SRX12323

But in the end I also didn't find the final solution nor would I know how to add a third "variable" then. I hope it is somehow understandable what I want to have. It would be very awesome if someone could help me.

EDIT:

Ok - I reworked my code and tried to dig into everything. I fear my understanding lacks in connecting the things I read in this case. I would really appreciate some tips which will probably help me to understand my confusion. First of all: Rather than try to download data from a pipeline I decided to do this in a config step. I tried out two different versions now:

Based on this answer I tried version one. I like the version with the two files. But I'm stuck in how to deal with the variables now in things like using them with the lambda function or everything where you normally would write "config["sample"]". So my problem here is that I don't knwo ho to proceed or how the correct syntax is now to call the variables.

    #version one
configfile: "config.yaml"
sample_file = config["sample_file"]

import pandas as pd

sample = pd.read_table(sample_file)['samples']
adapt = pd.read_table(sample_file)['adapters']

rule trimming_cutadapt: 
        input:
            data=expand("../data/{sample}.fastq", name = pd.read_table(sample_file)['names']),
            lambda wildcards: ??? 
        output:
            "trimmed/ctadpt_{sample}.fastq"
        shell:
            "cutadapt -a {adapt}"

So I went back to try to understand using and defining the wildcards. So (among other things) I looked into the example snakefile and the example rules of Johannes. And of course into the man. Oh and the Thing about the zip function.

At least I don't get an error anymore that it can't deal with wildcards or whatever. Now it's just doing nothing. And I can't find out why because I don't get any information. Additionaly I marked some points which I don't understand.

    #version two
configfile: "config_ChIP_Seq_Pipeline.yaml"

rule all:
    input: 
        expand("../data/{sample}.fastq", sample=config["samples"])
#when to write the lambda or the expand in a rule all and when into the actual rule?        
rule trimming_cutadapt: 
        input:
            "../data/{sample}.fastq"
        params: 
            adapt=lambda wildcards: config[wildcards.sample]["adapt"] #why do I have to write .samle? when I have to use wildcard.XXX in the shell part?
        output:
            "trimmed/ctadpt_{sample}.fastq"
        shell:
            "cutadapt -a {params.adapt}"

As a testfile I used this one. My configfile in version 1:

sample_file: "sample.tab"

and the tab file:

samples    names     adapters   
test_1     input     GACCTA

and the configfile from version two:

samples:
- test_1

adapt:
- GTACGTAG

Thanks for your help and patients!

Cheers

Have you looked at how to use the [configfile](https://snakemake.readthedocs.io/en/stable/tutorial/advanced.html?highlight=configfile#step-2-config-files)? Data from configfile gets read into `config` dictionary (not `wildcards` as in your code) from which you can retrieve the "variables" you are interested in. — Manavalan Gajapathy, May 14 '19 at 15:59
yea I did. But the problem is that with fastq-dump there will be no file but just the accession number. So it's just "fastq-dump SRX1235621" — VcFbnne, May 15 '19 at 12:16
I would suggest cleaning up your question and posting an example with minimal config and minimal code to show your problem. As it stands now, the question is rather broad. — Manavalan Gajapathy, May 15 '19 at 15:38

score 0 · Answer 1 · answered May 14 '19 at 18:14

0

You can look at this post to see how to store and access sample information.

Then you can look at Snakemake documentation here, more specifically at the zip function, which might help you as well.

answered May 14 '19 at 18:14

rioualen

948
8
17

1

To be a useful answer, you need to indicate how the linked material answers the question. Please give at least one example. – merv May 15 '19 at 20:19

Snakemake › Access multiple keys from config file

1 Answers1