I'd like to make a workflow which downloads the list of some FASTQ files from the remote server, checks md5 and runs some post-processing, e.g. aligning.
I understand how to implement this using two workflows:
first download fastq files list file, e.g.
md5
file.read the
md5
file content and create corresponding targets inall
rule for desired resulting files.
I'd like to do this in a single workflow. The incorrect workflow below shows the idea what I'd like to achieve.
in
all
ruleinput:
section I don't know{sample}
values beforemd5
file is download and parsedI've tried to play with dynamic, checkpoints and subforkflows, but failed to achieve the desired result. As for
dynamic
I've managed only to implement this workflow only for dynamic("fastq/{sample}.fq.gz.md5") output.Also, I'm interested in a solution which doesn't use
dynamic
because it is deprecated.
rule all:
input:
"md5",
"bams/{sample}.bam",
rule download_files_list:
output: "md5"
#shell: "wget {}".format(config["url_files_list"])
run:
# For testing instead of downloading:
content = """
bfe583337fd68b3 ID_001_1.fq.gz
1636b6756daa65f ID_001_2.fq.gz
0428baf25307249 ID_002_1.fq.gz
de33d81ba5bfa62 ID_002_2.fq.gz
""".strip()
with open(output[0], mode="w") as f:
print(content, file=f)
rule fastq_md5_files:
input: "md5"
output: "fastq/{sample}.fq.gz.md5"
shell: "mkdir -p fastq && awk '{{ print $0 > (\"fastq/\" $2 \".md5\") }}' {input}"
rule download_fastq_and_check_md5:
input: "fastq/{sample}.fq.gz.md5"
output: "fastq/{sample}.fq.gz"
#shell: "wget {}/{{sample}} && md5sum --check {{input}}".format(config["url_file_prefix"])
shell: "touch {output}"
rule align_fastq:
input: "fastq/{sample}.fq.gz"
output: "bams/{sample}.bam"
shell: "touch {output}" # aligning task