1

I am looking in the context of joining two streams using Flink and would like to understand how these two streams differ and affect how Flink processes them.

As a related question, I would also like to understand how CoProcessFunction would differ from a KeyedCoProcessFunction.

Sairam Sankaran
  • 309
  • 9
  • 18

1 Answers1

2

A KeyedStream is a DataStream that has been hash partitioned, with the effect that for any given key, every stream element for that key is in the same partition. This guarantees that all messages for a key are processed by the same worker instance. Only keyed streams can use key-partitioned state and timers.

A KeyedCoProcessFunction connects two streams that have been keyed in compatible ways -- the two streams are mapped to the same keyspace -- making it possible for the KeyedCoProcessFunction to have keyed state that relates to both streams. For example, you might want to join a stream of customer transactions with a stream of customer updates -- joining them on the customer_id. You would implement this in Flink (if doing so at a low level) by keying both streams by the customer_id, and connecting those keyed streams with a KeyedCoProcessFunction.

On the other hand, a CoProcessFunction has two inputs, but with no particular relationship between those inputs.

The Flink training has tutorials covering keyed streams and connected streams, and a related exercise/example.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • Thank you so much David for the explanation. It makes sense now. I have a follow up question: How does scaling work in this case? Let's say there is a hot partition and the memory/disk cannot hold all the events, how will Flink ensure that the events are partitioned correctly to perform the join? – Sairam Sankaran Feb 18 '21 at 18:22
  • Thank you for asking about that in a new question. That way others can more easily find and benefit from the answer. I'll follow-up under https://stackoverflow.com/questions/66273158/how-does-flink-scale-for-hot-partitions. – David Anderson Feb 19 '21 at 11:23