I am very new to snakemake and also not so fluent in python (so apologies this might be a very basic stupid question):
I am currently building a pipeline to analyze a set of bamfiles with atlas. These bamfiles are located in different folders and should not be moved to a common one. Therefore I decided to provide a samplelist looking like this (this is just an example, in reality samples might be on totaly different drives):
Sample Path
Sample1 /some/path/to/my/sample/
Sample2 /some/different/path/
And load it in my config.yaml with:
sample_file: /path/to/samplelist/samplslist.txt
Now to my Snakefile:
import pandas as pd
#define configfile with paths etc.
configfile: "config.yaml"
#read-in dataframe and define Sample and Path
SAMPLES = pd.read_table(config["sample_file"])
BAMFILE = SAMPLES["Sample"]
PATH = SAMPLES["Path"]
rule all:
input:
expand("{path}{sample}.summary.txt", zip, path=PATH, sample=BAMFILE)
#this works like a charm as long as I give the zip-function in the rules 'all' and 'summary':
rule indexBam:
input:
"{path}{sample}.bam"
output:
"{path}{sample}.bam.bai"
shell:
"samtools index {input}"
#this following command works as long as I give the specific folder for a sample instead of {path}.
rule bamdiagnostics:
input:
bam="{path}{sample}.bam",
bai=expand("{path}{sample}.bam.bai", zip, path=PATH, sample=BAMFILE)
params:
prefix="analysis/BAMDiagnostics/{sample}"
output:
"analysis/BAMDiagnostics/{sample}_approximateDepth.txt",
"analysis/BAMDiagnostics/{sample}_fragmentStats.txt",
"analysis/BAMDiagnostics/{sample}_MQ.txt",
"analysis/BAMDiagnostics/{sample}_readLength.txt",
"analysis/BAMDiagnostics/{sample}_BamDiagnostics.log"
message:
"running BamDiagnostics...{wildcards.sample}"
shell:
"{config[atlas]} task=BAMDiagnostics bam={input.bam} out={params.prefix} logFile={params.prefix}_BamDiagnostics.log verbose"
rule summary:
input:
index=expand("{path}{sample}.bam.bai", zip, path=PATH, sample=BAMFILE),
bamd=expand("analysis/BAMDiagnostics/{sample}_approximateDepth.txt", sample=BAMFILE)
output:
"{path}{sample}.summary.txt"
shell:
"echo -e '{input.index} {input.bamd}"
I get the error
WildcardError in line 28 of path/to/my/Snakefile: Wildcards in input files cannot be determined from output files: 'path'
Can anyone help me?
- I tried to solve this problem with join
, or creating input functions but I think I am just not skilled enough to see my error...
- I guess the problem is, that my summary-rule does not contain the tuplet with the {path}
for the bamdiagnostics-output (since the output is somewhere else) and cannot make the connection to the input file or so...
- Expanding my input on bamdiagnostics-rule makes the code work, but of course takes every samples input to every samples output and creates a big mess:
In this case, both bamfiles are used for the creation of each outputfile. This is wrong as the samples AND the output are to be treated independently.