1

I am trying to use kedro to run a workflow. Following figure is my workflow(node 1-3 is sequential and nodes 31, 32 and 33 is three branches which from node 3). You can see the kedro is running sequentially from 1 to 3, due to the clearly dependency among nodes. However, when it comes to the 31, 32 and 33 nodes, the kedro just ran really randomly. It can start from 31 or 32 or 33. Does anyone has any idea that I can let the kedro ran 31 first then 32 and then 33? Thanks! workflow

I have tried to tag all the nodes, but the running order is kept randomly when the kedro ran into the node of 31 to 33

1 Answers1

1

so Kedro topologically sorts the nodes at runtime and you're not guaranteed to get the same run order.

The way that people often try to fix this issue is to break up your pipelines into sub-pipelines and call them via the CLI.

kedro run --pipeline a && kedro run --pipeline b

The other option is to create a dummy dataset dependency which forces the nodes to operate in the order you want it to run.

datajoely
  • 1,466
  • 10
  • 13
  • Thanks for reply. As you said that kedro will sort the nodes at runtime. Then how kedro does this sort, based on what condition? – William Huang Sep 12 '21 at 09:04
  • Sorry - I didn't see this. We use a topological sort, you can see the implementation here: https://github.com/quantumblacklabs/kedro/blob/1dafdc00b11cf0c25c9c4b7412162e4068e70365/kedro/pipeline/pipeline.py#L41 – datajoely Oct 07 '21 at 15:50