6

My Question in regarding Apache Flink framework.

Is there any way to support more than one streaming source like kafka and twitter in single flink job? Is there any work around.Can we process more than one streaming sources at a time in single flink job?

I am currently working in Spark Streaming and this is the limitation there.

Is this achievable by other streaming frameworks like Apache Samza,Storm or NIFI?

Response is much awaited.

Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122
Sadaf
  • 247
  • 5
  • 16

2 Answers2

8

Yes, this is possible in Flink and Storm (no clue about Samza or NIFI...)

You can add as many source operators as you want and each can consume from a different source.

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

Properties properties = ... // see Flink webpage for more details    

DataStream<String> stream1 = env.addSource(new FlinkKafkaConsumer08<>("topic", new SimpleStringSchema(), properties);)
DataStream<String> stream2 = env.readTextFile("/tmp/myFile.txt");

DataStream<String> allStreams = stream1.union(stream2);

For Storm using low level API, the pattern is similar. See An Apache Storm bolt receive multiple input tuples from different spout/bolt

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137
  • Right.Thanks for the answer. Can we add this flink receiver in spark project? Is there any middle ware to join flink streaming with apache spark. – Sadaf Nov 07 '16 at 06:45
  • I never used Spark. No clue. Furthermore, I am not aware of any middleware to combine Flink and Spark -- and I am wondering why you want to do this in the first place... – Matthias J. Sax Nov 08 '16 at 04:21
  • Actually i am working on spark project. But I cant stream data from multiple streaming sources in a single job there using spark streaming.So i want to overcome this problem using flink. and really what to know how to join these both. – Sadaf Nov 08 '16 at 08:07
  • 1
    I have no idea how this could be done... For sure, you cannot mix both in the same application code. You might want to use a layer in between. For example, do some processing with Flink, write result somewhere (maybe Kafka) and read it into Spark afterwards. – Matthias J. Sax Nov 08 '16 at 20:58
  • Yes. This is what i wanted to know. Thanks :) – Sadaf Nov 10 '16 at 08:08
  • There is a typos: `see` should be `env` – lasclocker Jun 12 '19 at 07:16
  • Thanks @lasclocker. Feel free to edit directly next time :) – Matthias J. Sax Jun 12 '19 at 07:27
0

Some solutions have already been covered, I just want to add that in a NiFi flow you can ingest many different sources, and process them either separately or together.

It is also possible to ingest a source, and have multiple teams build flows on this without needing to ingest the data multiple times.

Dennis Jaheruddin
  • 21,208
  • 8
  • 66
  • 122