1

In my Java application, I have three DataStreams. For example, for One stream data is consumed from Kafka, for another stream data is consumed from Apache Nifi. For these two streams Object type is different. For example, Stream-1 object type is Person, Stream-2 object type is Address.

The third one is the broadcast stream (for this data is consumed from Kafka).

Now I want to combine Stream-1 and Stream-2 in a Job class and want to split in the task process element. How to implement this?

Note : Stream-1 is mainstream and Stream-2 is side input. MainStream is continuously fetching data from Kafka. For Side Input, initially while the application is UP all table data is loaded from DB and then read new data when the table data is updated (not frequently) .

Sample structure:

DataStream<Person> stream-1 = env.addSource(read data from kafka)....
DataStream<Address> stream-2 = env.addSource(read data from nifi)....
BroadcastStream<String> BroadCastStream = stream-3.broadcast(read data from kafka);

I was referred to as the following links.

FLIP-17 Side Inputs for DataStream API

jira/browse/FLINK-6131

My Use case is :

Join stream with slowly evolving data: The side input that we use for enriching is evolving over time (Data is read from DB). This can be done by waiting for some initial data to be available before processing the main input and the continuously ingesting new data into the internal side input structure as it arrives.

Azhagesan
  • 217
  • 1
  • 2
  • 12
  • Could you update your question to why a join over the three sources is not sufficient? Also have a look at [temporal joins|https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/joins.html#join-with-a-temporal-table-function]. – Arvid Heise Jun 02 '20 at 09:14
  • Join is not sufficient. Why because in my case each stream type is different. Join is applicable only for the same type of streams. – Azhagesan Jun 02 '20 at 09:39
  • What do you mean exactly? You can easily join stream1 and stream2 even if they have different types. Then you can add the broadcast to the result. – Arvid Heise Jun 02 '20 at 09:46
  • Hi Arvid, Thanks for the details. Lets I will try. – Azhagesan Jun 02 '20 at 10:00
  • Hi Arvid Heise, I am new to the Flink. Do you have any sample code part for joining different stream type and broadcast? – Azhagesan Jun 02 '20 at 10:06
  • [Joins in datastream](https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/joining.html) are used in [this example](https://github.com/apache/flink/blob/master/flink-examples/flink-examples-streaming/src/main/java/org/apache/flink/streaming/examples/join/WindowJoin.java) and many more. – Arvid Heise Jun 02 '20 at 11:06
  • Thanks, Arvid Heise – Azhagesan Jun 02 '20 at 11:50

1 Answers1

0

Based on the latest response, the recommendation by @Arvid was in fact what was needed here.

Core of the answer:

You can easily join stream1 and stream2 even if they have different types. Then you can add the broadcast to the result

Links to doc and example, and a relevant snippet from the doc (the example is too long to be included in here):

import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.streaming.api.windowing.assigners.TumblingEventTimeWindows;
import org.apache.flink.streaming.api.windowing.time.Time;
 
...

DataStream<Integer> orangeStream = ...
DataStream<Integer> greenStream = ...

orangeStream.join(greenStream)
    .where(<KeySelector>)
    .equalTo(<KeySelector>)
    .window(TumblingEventTimeWindows.of(Time.milliseconds(2)))
    .apply (new JoinFunction<Integer, Integer, String> (){
        @Override
        public String join(Integer first, Integer second) {
            return first + "," + second;
        }
    });
Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
  • Thanks, Dennis for your info. One doubt. In your example code block both orangestream and greenStream are the same Datatypes (both are Integer). If one is Integer and another one is a String type, can we able to apply to join() function? – Azhagesan Jul 27 '20 at 13:15
  • @Azhagesan I can't check right now, but conceptually the keys need to 'fit' where the rest can be made together however you want. For instance the examples already show how you take two integers and make a string, I would expect it to be trivial to, for example: take an integer and a string and make a string. -- In the worst case you could explicitly cast the field to a different datatype. – Dennis Jaheruddin Jul 27 '20 at 18:35
  • Thanks, Dennis . Let me check – Azhagesan Jul 28 '20 at 07:27