I'm currently writing a pipeline that looks like this (code for the minimal example is below, the input files are just blank files which names are in the SAMPLES list in the example).
What I would like, is, if a sample fails in one of the first two steps (minimal example is set to make sample1
fail on rule two
), keep going with all the next steps just like it not being there (meaning it would do the rule gather_and_do_something
and split_final
only on sample2
and sample3
here).
I'm already using the --keep-going
option to go on with independant jobs but I have trouble defining the input for the common rule and make it ignore the files that were in a failing path.
SAMPLES = ["sample1", "sample2", "sample3"]
rule all:
input:
expand("{sample}_final", sample=SAMPLES)
rule one:
input:
"{sample}"
output:
"{sample}_ruleOne"
shell:
"touch {output}"
rule two:
input:
rules.one.output
output:
"{sample}_ruleTwo"
run:
if input[0] != 'sample1_ruleOne':
with open(output[0], 'w') as fh:
fh.write(f'written {output[0]}')
rule gather_and_do_something:
input:
expand(rules.two.output, sample=SAMPLES)
output:
'merged'
run:
words = []
for f in input:
with open(f, 'r') as fh:
words.append(next(fh))
if len(input):
with open(output[0], 'w') as fh:
fh.write('\n'.join(words))
rule split_final:
input:
rules.gather_and_do_something.output
output:
'{sample}_final'
shell:
'touch {output}'
I tried writing some custom function to use as an input but that does not seems to work...
def get_files(wildcards):
import os
return [f for f in expand(rules.two.output, sample=SAMPLES) if f in os.listdir(os.getcwd())]
rule gather_and_do_something:
input:
unpack(get_files)
output:
'merged'
run:
words = []
for f in input:
with open(f, 'r') as fh:
words.append(next(fh))
if len(input):
with open(output[0], 'w') as fh:
fh.write('\n'.join(words))