2

Let's say we have multiple long running pipeline nodes. It seems quite straight forward to checkpoint or cache the intermediate results, so when nodes after a checkpoint are changed or added only these nodes must be executed again.

Does Kedro provide functionality to make sure, that when I run the pipeline only those steps are executed that have changed? Also the reverse, is there a way to make sure, that all steps that have changed are executed?

Let's say a pipeline producing some intermediate result changed, will it be executed, when i execute a pipeline depending on the output of the first?

TL;DR: Does Kedro have makefile-like tracking of what needs to be done and what not?

I think my question is similar to issue #341, but I do not require support of cyclic graphs.

Sir ExecLP
  • 83
  • 1
  • 5

1 Answers1

2

You might want to have a look at the IncrementalDataSet alongside the partitioned dataset documentation, specifically the section on incremental loads with the incremental dataset which has a notion of "checkpointing", although checkpointing is a manual step and not automated like makefile.

Zain Patel
  • 983
  • 5
  • 14