I have a Snakemake recipe which contains a very expensive preparatory step, common for all its calls. Here is a pseudorule for demonstration sake:
rule sample:
input:
"{name}.config"
output:
"{name}.npz"
run:
import somemodule
data = somemodule.Loader("some_big_data") # expensive
np.savez(output, data.process(input)) # also expensive
At the moment data
is loaded de novo for every target, which is pretty suboptimal. How can I make it to be loaded only once?
I look for something which allows to rewrite the rule like that:
rule sample:
input:
"{name}.config"
output:
"{name}.npz"
setup:
import somemodule
data = somemodule.Loader("some_big_data") # expensive
run:
np.savez(output, data.process(input)) # also expensive
or:
rule sample:
input:
"{name}.config"
output:
"{name}.npz"
run:
import somemodule
data = somemodule.Loader("some_big_data") # expensive
for job in jobs:
np.savez(job.output,
data.process(job.input)) # also expensive
In another question I have described the code Loader.__init__()
is based on.