Essentially, I am trying to make a snakemake rule for trimming for both paired-end and single-end reads. My problem is that for unpaired reads, there is 1 output, but for paired reads, there are 2 outputs (technically 4 but for my rule, I've specified 2). The error I get I think has to do with my output.... I'll just show what I have first.
config.yaml:
sample_file: "sample.tab"
FASTQ_DIR: "/dir/data/fastq_files"
TRIMMED_DIR: "/dir/data/trimmed"
sample.tab:
Sample Layout
SRR11213896 SE
ERR3887380 PE
Snakefile:
configfile: "config.yaml"
FASTQ_DIR = config["FASTQ_DIR"]
TRIMMED_DIR = config["TRIMMED_DIR"]
import pandas as pd
sample_file = config["sample_file"]
samples_df = pd.read_table(sample_file).set_index("Sample", drop = True)
srr_samples = list(samples_df.index)
srr_unpaired = list(samples_df[samples_df["Layout"] == "SE"].index)
srr_paired = list(samples_df[samples_df["Layout"] == "PE"].index)
def get_reads(wc):
tag = samples_df.loc[samples_df.index == wc.sample, 'Layout'].iloc[0]
if tag == "SE":
return FASTQ_DIR + "/" + wc.sample + "_1M.fastq"
if tag == "PE":
return FASTQ_DIR + "/" + wc.sample + "_1M_R1.fastq", FASTQ_DIR + "/" + wc.sample + "_1M_R2.fastq"
rule all:
input:
expand(TRIMMED_DIR + "/{sample}_trimmed.fastq", sample = srr_unpaired),
expand(TRIMMED_DIR + "/{sample}_R1_trimmed.fastq", sample = srr_paired),
expand(TRIMMED_DIR + "/{sample}_R2_trimmed.fastq", sample = srr_paired)
rule trimmed:
input:
reads = get_reads
output:
unpaired = TRIMMED_DIR + "/{sample}_trimmed.fastq",
paired_r1 = TRIMMED_DIR + "/{sample}_R1_trimmed.fastq",
paired_r2 = TRIMMED_DIR + "/{sample}_R2_trimmed.fastq"
params:
tag = lambda wc: samples_df.loc[samples_df.index == wc.sample, 'Layout'].iloc[0],
to_trim = "TRAILING:30 SLIDINGWINDOW:4:15 MINLEN:15",
read = lambda wc: samples_df.loc[samples_df.index == wc.sample].index[0],
dir = TRIMMED_DIR + "/",
read_dir = FASTQ_DIR + "/"
run:
if params.tag == "SE":
shell("trimmomatic {params.tag} {input.reads} {output.unpaired} {params.to_trim}")
if params.tag == "PE":
shell("trimmomatic {params.tag} {input.reads} {output.paired_r1} {params.dir}{params.read}_R1_unpaired.fastq {output.paired_r2} {params.dir}{params.read}_R2_unpaired.fastq {params.to_trim}")
Running snakemake -n
by itself gives no errors, but running snakemake
I get this error, here is the snakemake log:
Building DAG of jobs...
Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 all
2 trimmed
3
rule trimmed:
input: /dir/data/fastq_files/ERR3887380_1M_R1.fastq, /dir/data/fastq_files/ERR3887380_1M_R2.fastq
output: /dir/data/trimmed/ERR3887380_trimmed.fastq, /dir/data/trimmed/ERR3887380_R1_trimmed.fastq, /dir/data/trimmed/ERR3887380_R2_trimmed.fastq
jobid: 2
wildcards: sample=ERR3887380
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/snakemake.log
I SUSPECT it's because in the output:
section of my snakemake.log
, there appears to be 3 outputs when for this read, there should only be 2. Does anyone have any ideas around this? Much help would be appreciated !!!!!