mapping reads using snakemake

Question

I am trying to run hisat2 mapping using the snakemake. Basically, I'm using a config.yaml file like this:

reads:
    set1: /path/to/set1/samplelist.tab
hisat2:
    database: genome
    genome: genome.fa
    nodes: 2
    memory: 8G
    arguments: --dta
executables:
    hisat2: /Tools/hisat2-2.1.0/hisat2
    samtools: /Tools/samtools-1.3/samtools

Then Snakefile:

configfile: "config.yaml"
workdir: "/path/to/working_dir/"
# Hisat2
rule hisat2:
    input:
          reads = lambda wildcards: config["reads"][wildcards.sample]
    output:
        bam = "{sample}/{sample}.bam"
    params:
        idx=config["hisat2"]["database"],
        executable = config["executables"]["hisat2"],
        nodes = config["hisat2"]["nodes"],
        memory = config["hisat2"]["memory"],
        executable2 = config["executables"]["samtools"]
    run:
        shell("{params.executable} --dta -p {params.nodes} -x {params.idx} {input.reads} |"
        "{params.executable2} view -Sbh -o {output.bam} -")
# all
rule all:
    input:
        lambda wildcards: [sample + "/" + sample + ".bam"
        for sample in config["reads"].keys()]

My samplelist.tab is like this:

id  reads1  reads2
set1a set1a_R1.fastq.gz set1a_R2.fastq.gz
set1b set1b_R1.fastq.gz set1b_R2.fastq.gz

Any hints how to make this working? I apoligize for a messy script, just started using snakemake.

basically, I am not sure it is taking read1 and read2, do I have to create two different input files for that?I made some edits to the script... — user3224522, Nov 05 '19 at 14:14
this is the error it is giving me: Error: reads file does not look like a FASTQ file. So I think it is not reading well samplelist.tab file... — user3224522, Nov 05 '19 at 14:17
reads should be the actual path to the reads, not the tab-separated file. What does the samplelist.tab look like? — Maarten-vd-Sande, Nov 06 '19 at 07:46
there is an example of samplelist.tab in the question. Could you give me the example how can I put the paths on the way that it will read me both r1 and r2? also for one set I have more than 1 fastq.gz files. — user3224522, Nov 06 '19 at 08:09
Sorry I missed the samplelist.tab. How do you mean to combine the replicates? Since there is a set1a and set1b. Or do you want to align them separately? — Maarten-vd-Sande, Nov 06 '19 at 09:19
yes, I would like to align all separately, since a and b are tissues, not replicates, but produce one single bam file for set1, i guess I can merge bam files at the end anyway — user3224522, Nov 06 '19 at 09:23

Maarten-vd-Sande · Accepted Answer · 2019-11-06T12:46:18.607

2

You will have to do something like this:

import pandas as pd

reads = pd.read_csv(config["reads"]['set1'], sep='\t', index_col=0)
def get_fastq(wildcards):
    return list(reads.loc[wildcards.sample].values)

rule hisat2:
    input:
          get_fastq
    ...

First you will need to load the samplelist and store this (I did it as a pandas dataframe). Then you can lookup which files belong to that sample name.

Edit:

Rewriting the code to look like this is much more readable (in my opinion).

rule hisat2:
    input:
          [{sample}_R1.fastq.gz,
           {sample}_R2.fastq.gz]
    ...

edited Nov 06 '19 at 12:46

answered Nov 06 '19 at 09:41

Maarten-vd-Sande

3,413
10
27

thanks, I have another 2 questions, if you prefer I can open a new discussion or post here. – user3224522 Nov 06 '19 at 10:02
What if I will list all my files in config.yaml like: set1: set1a_R1.fastq.gz, set1a_R2.fastq.gz, set1b set1b_R1.fastq.gz, set1b_R2.fastq.gz. then in the rules I define both r1 and r2 like this: r1 = lambda wildcards: config["reads"][wildcards.sample]_R1.fastq.gz r2=lambda wildcards: config["reads"][wildcards.sample]_R2.fastq.gz. ISo my questions: 1. how to list the fastq.gz files in the config.yaml? comma separated? 2. The error I ge it that there is a syntax error in the r1 and r2 lines – user3224522 Nov 06 '19 at 10:05
1

Do you actually need to specify two file paths? You could give a basepath, and fill in R1.fastq.gz and R2.fastq.gz in the end. – Maarten-vd-Sande Nov 06 '19 at 10:10
1

somethins like this: set1: set1a, set1b? – user3224522 Nov 06 '19 at 10:16
I get the error invalid syntax, so probably something is wrong :(( – user3224522 Nov 06 '19 at 11:54
My bad, it should be a list. Take a look again at the tutorial for the 'basics' – Maarten-vd-Sande Nov 06 '19 at 12:46

mapping reads using snakemake

1 Answers1