Does Kedro support Checkpointing/Caching of Results?

Question

Let's say we have multiple long running pipeline nodes. It seems quite straight forward to checkpoint or cache the intermediate results, so when nodes after a checkpoint are changed or added only these nodes must be executed again.

Does Kedro provide functionality to make sure, that when I run the pipeline only those steps are executed that have changed? Also the reverse, is there a way to make sure, that all steps that have changed are executed?

Let's say a pipeline producing some intermediate result changed, will it be executed, when i execute a pipeline depending on the output of the first?

TL;DR: Does Kedro have makefile-like tracking of what needs to be done and what not?

I think my question is similar to issue #341, but I do not require support of cyclic graphs.

score 2 · Accepted Answer · answered Jun 05 '20 at 13:08

You might want to have a look at the IncrementalDataSet alongside the partitioned dataset documentation, specifically the section on incremental loads with the incremental dataset which has a notion of "checkpointing", although checkpointing is a manual step and not automated like makefile.

Does Kedro support Checkpointing/Caching of Results?

1 Answers1