0

I am asking myself a question concerning Let's suppose I have a flow F which is replicated X times. All the replicated flows are then Join on the same key but with different datasets each time.

I want the joins to be run in a parallel layout. For this particular case, do I need to use X time the "Partition by key" component or can I put only one at the input of the replicate (instead of 1 per replicate output) ?

TLDR : Is this graph https://ibb.co/hHmk5e equivalent to https://ibb.co/i2NNJz supposing all joins occur on same key

Thank you,

LostReality
  • 657
  • 2
  • 8
  • 33

1 Answers1

2

Use Replicate into multiple Partition By Keys. Pay caution to the checkpoints, if you have 3 checkpoints after the replicate consider removing them and placing a single checkpoint before the replicate.

Chris Day
  • 46
  • 2