1

let's say that I have a function that process a DataStream<X> and sent the return to DB, but I need to read from another source and when process this new DataStream I will need to find into the states that I could generate before store the DataStream<X> into the DB and find one Id that it is coming into the DataStream<Y> and then trigger an action.

My question is:

Is possible by using for example a Co-ProcessFunction in Flink to process the result of the transformation in DataStream<X> and creates the states there and at the same time process the DataStream<Y> to have the states and the new stream in the same operator?

if the first question is totally wrong, which could be possible, Is there anyhow to do what I need to do?

Hoping someone can understands what I need to do.

This is the graphic idea of what I need to do. enter image description here

Alter
  • 903
  • 1
  • 11
  • 27

2 Answers2

1

Yes, it is possible to connect two streams of different types, and process them together using shared state.

In order to connect Stream<X> with Stream<Y>, and to have them share state, you will have to define key selector functions that return equivalent keys for both streams. (Just as in SQL, where in order to join two tables, you have to describe how they can be joined.)

In this pseudocode, anotherFlinkFunction is a RichCoFlatMapFunction. I've assumed that both streams have an id field that has the same value when items from stream X and stream Y should be combined.

x = env.addSource(...);
xTransformed = x.flatMap(...);
xTransformed.addSink(DB);

y = env.addSource(...);

z = xTransformed
  .connect(y)
  .keyBy(xt->xt.id, y->y.id)
  .flatMap(new anotherFlinkFunction());

z.addSink(...);

You'll find related examples in the Apache Flink training tutorials at https://ci.apache.org/projects/flink/flink-docs-stable/learn-flink/etl.html#example and in the accompanying exercise at https://github.com/apache/flink-training/tree/master/rides-and-fares.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • Yes, this is exactly what I've done so far, but one question here: is it possible to do a connect without doing `keyBy(xt->xt.id, y->y.id)` ? just like this: `ConnectedStreams connectedStreams = StreamX.connect(StreamY);` and then do the `.flatMap(new anotherFlinkFunction());` ? I've done this, but no sure if is okay, because the idea is to have values into the states for `Stream`, but `Stream` is not always sending records as `Stream` so I cannot do the `keyBy`, or can I? – Alter Nov 22 '20 at 22:57
  • In the other hand I have this question: while doing this, I will have as result a data type `ConnectedStreams`, how can I add Sink to this kind of data types? because `addSink()` function is not allowed here. – Alter Nov 22 '20 at 23:16
  • Yes, both Streams has the and `id` field with the same value. – Alter Nov 22 '20 at 23:17
  • I don't understand your first comment above, where you ask "`Stream` is not always sending records as `Stream` so I cannot do the keyBy, or can I?" – David Anderson Nov 23 '20 at 09:14
  • That was a wrong question from me, sorry for that. Do you know some link that shows me how to do sink to a `ConnectedStream`? – Alter Nov 23 '20 at 13:02
  • It's not possible to directly connect a sink to a `ConnectedStream`. You must first process the connected stream, using something like a `RichCoFlatMap` or `CoProcessFunction`. If instead you just want to merge two streams together, you could use `union`. – David Anderson Nov 23 '20 at 14:17
  • Yes, you were right, I forgot to get the `SingleOutputStreamOperator` from the `RichCoFlatMap`. – Alter Nov 23 '20 at 14:23
0

Full answer here:

 Strem<X> streamX = fromSource();
 Strem<Y> streamY = fromSource();

 ConnectedStreams<X, Y> connectedStreams = streamX.connect(streamY).keyBy(x-> x.id, y-> y.id);

 /*JoinStreamsFunction will receive X and Y and get Z as output*/
 SingleOutputStreamOperator<Z> out = connectedStreams.flatMap(new JoinStreamsFunction());
        out.addSink(new SinkFunctionHere());
Alter
  • 903
  • 1
  • 11
  • 27