Each wildcard in the input
section shall have a corresponding wildcard (with the same name) in the output
section. That is how Snakemake works: when the Snakemake tries to constract the DAG of jobs and finds that it needs a certain file, it looks at the output
section for each rule and checks if this rule can produce the required file. This is the way how Snakemake assigns certain values to the wildcard in the output
section. Every wildcard in other sections shall match one of the wildcards in the output
, and that is how the input
gets concrete filenames.
Now let's look at your rule merge_fastq
:
rule merge_fastq:
input:
directory("{pass_file}")
output:
"{sample}/data/merged.fastq.gz"
wildcard_constraints:
id="*.fastq.gz"
shell:
"cat {input}/{id} > {output}"
The only wildcard that can get its value is the {sample}
. The {pass_file}
and {id}
are dangling.
As I see, you are trying to merge the files that are not known on the design time. Take a look at the dynamic
files, checkpoint
and using a function in the input
.
The rest of your Snakefile is hard to understand. For example I don't see how you specify the files that match this pattern: "{sample}/data/merged.fastq.gz"
.
Update:
Lets say, I have a
directory(/home/other_computer/jobs/data/<sample_name>/*.fastq.gz)
which is my input and output is
(/result/merged/<sample_name>/merged.fastq.gz). What I tried is having
the first path as input: {"pass_files"} (this comes from my config
file) and output : "result/merged/{sample}/merged.fastq.gz"
First, let's simplify the task a little bit and replace the {pass_file}
with the hardcoded path. You have 2 degrees of freedom: the <sample_name>
and the unknown files in the /home/other_computer/jobs/data/<sample_name>/
folder. The <sample_name>
is a good candidate for becoming a wildcard, as this name can be derived from the target file. The unknown number of files *.fastq.gz
doesn't even require any Snakemake constructs as this can be expressed using a shell command.
rule merge_fastq:
output:
"/result/merged/{sample_name}/merged.fastq.gz"
shell:
"cat /home/other_computer/jobs/data/{sample_name}/*.fastq.gz > {output}"