3

Essentially, I am trying to make a snakemake rule for trimming for both paired-end and single-end reads. My problem is that for unpaired reads, there is 1 output, but for paired reads, there are 2 outputs (technically 4 but for my rule, I've specified 2). The error I get I think has to do with my output.... I'll just show what I have first.

config.yaml:

sample_file: "sample.tab"

FASTQ_DIR: "/dir/data/fastq_files"
TRIMMED_DIR: "/dir/data/trimmed"

sample.tab:

Sample  Layout
SRR11213896     SE
ERR3887380      PE

Snakefile:

configfile: "config.yaml"
FASTQ_DIR = config["FASTQ_DIR"]
TRIMMED_DIR = config["TRIMMED_DIR"]
import pandas as pd
sample_file = config["sample_file"]
samples_df = pd.read_table(sample_file).set_index("Sample", drop = True)
srr_samples = list(samples_df.index)
srr_unpaired = list(samples_df[samples_df["Layout"] == "SE"].index)
srr_paired = list(samples_df[samples_df["Layout"] == "PE"].index)

def get_reads(wc):
        tag = samples_df.loc[samples_df.index == wc.sample, 'Layout'].iloc[0]
        if tag == "SE":
                return FASTQ_DIR + "/" + wc.sample + "_1M.fastq"
        if tag == "PE":
                return FASTQ_DIR + "/" + wc.sample + "_1M_R1.fastq", FASTQ_DIR + "/" + wc.sample + "_1M_R2.fastq"

rule all:
        input:
                expand(TRIMMED_DIR + "/{sample}_trimmed.fastq", sample = srr_unpaired),
                expand(TRIMMED_DIR + "/{sample}_R1_trimmed.fastq", sample = srr_paired),
                expand(TRIMMED_DIR + "/{sample}_R2_trimmed.fastq", sample = srr_paired)

rule trimmed:
        input:
                reads = get_reads
        output:
                unpaired = TRIMMED_DIR + "/{sample}_trimmed.fastq",
                paired_r1 = TRIMMED_DIR + "/{sample}_R1_trimmed.fastq",
                paired_r2 = TRIMMED_DIR + "/{sample}_R2_trimmed.fastq"
        
        params:
                tag = lambda wc: samples_df.loc[samples_df.index == wc.sample, 'Layout'].iloc[0],
                to_trim = "TRAILING:30 SLIDINGWINDOW:4:15 MINLEN:15",
                read = lambda wc: samples_df.loc[samples_df.index == wc.sample].index[0],
                dir = TRIMMED_DIR + "/",
                read_dir = FASTQ_DIR + "/"
        run: 
                if params.tag == "SE":
                        shell("trimmomatic {params.tag} {input.reads} {output.unpaired} {params.to_trim}")
                if params.tag == "PE":
                        shell("trimmomatic {params.tag} {input.reads} {output.paired_r1} {params.dir}{params.read}_R1_unpaired.fastq {output.paired_r2} {params.dir}{params.read}_R2_unpaired.fastq {params.to_trim}")

Running snakemake -n by itself gives no errors, but running snakemake I get this error, here is the snakemake log:

Building DAG of jobs...
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
        count   jobs
        1       all
        2       trimmed
        3

rule trimmed:
    input: /dir/data/fastq_files/ERR3887380_1M_R1.fastq, /dir/data/fastq_files/ERR3887380_1M_R2.fastq
    output: /dir/data/trimmed/ERR3887380_trimmed.fastq, /dir/data/trimmed/ERR3887380_R1_trimmed.fastq, /dir/data/trimmed/ERR3887380_R2_trimmed.fastq
    jobid: 2
    wildcards: sample=ERR3887380

Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/snakemake.log

I SUSPECT it's because in the output: section of my snakemake.log, there appears to be 3 outputs when for this read, there should only be 2. Does anyone have any ideas around this? Much help would be appreciated !!!!!

Hannah
  • 51
  • 5
  • maybe this is helpful https://stackoverflow.com/questions/46066571/accepting-slightly-different-inputs-to-snakemake-rule-fq-vs-fq-gz – mitoRibo Nov 19 '21 at 04:05
  • I think this is pretty much a duplicate of https://stackoverflow.com/questions/68337750/gracefully-handle-variable-number-of-input-and-output-files see if that helps – dariober Nov 19 '21 at 07:08
  • You can split the trim rule into two, such as trim_SE and trim_PE – Chang Ye Dec 27 '22 at 22:34

0 Answers0