0

Is there any way I can use Broadcast Join in FLINK the same way I used in SPARK. I'm working with JOINS but the data is large so I would require Broadcast Join.

Thank You

ASK5
  • 55
  • 11
  • 1
    Not very sure what exactly you want. Is this what you want? https://ci.apache.org/projects/flink/flink-docs-stable/dev/table/streaming/joins.html – Jiayi Liao Oct 17 '19 at 06:06
  • Hi @JiayiLiao do you know what broadcast join do in spark? Here's a link for you to follow :- https://www.oreilly.com/library/view/high-performance-spark/9781491943199/ch04.html It'd be really great if you could help me with this. I want same functioning in flink – ASK5 Oct 17 '19 at 06:22

1 Answers1

2

Flink does not provide a broadcast join like the one in Spark. It's pretty easy to implement one yourself using a BroadcastProcessFunction, but I wonder if it is really appropriate. A broadcast join only makes sense if one of the two streams is fairly small, otherwise a key-partitioned join makes a lot more sense.

To implement this, broadcast the smaller pattern stream and connect it to the event stream. In the processBroadcastElement method of a BroadcastProcessFunction, store the new pattern, and in the processElement method lookup the relevant pattern and combine it with the event that is being processed.

David Anderson
  • 39,434
  • 4
  • 33
  • 60
  • Hi @DavidAnderson, Thanks for your answer, I need it while joining two DataStreams, Yes there's quite a large difference in the size of the two – ASK5 Oct 17 '19 at 07:29
  • can you tell me with a snippet how to use it? if one table is named event and the other one as patt Link :- https://github.com/ASKRAJPUT5/flink-Table/blob/master/src/main/scala/flinkTable25.scala – ASK5 Oct 17 '19 at 07:31