1

I'm trying to update documents in Elasticsearch using Kafka messages (as a StreamSource). Writing to Elasticsearch in bulks using windows and the Elasticsearch connector as a sink is fine, however, we need to update existing data in the documents and read that in a bulk-performant manner (not for every tuple but for e.g. the whole window after a byKey() split that we want to aggregate over)

We are using Storm Trident right now which is performing bulk reads before a persistentAggregate and writes the updated aggregations back after, minimizing interaction with the backend. I just can't find something similar in Flink - any hints?

Peter Neubauer
  • 6,311
  • 1
  • 21
  • 24

1 Answers1

0

How about running two window call on stream -

window1 - To bulk read from elasticsearch

window2 - To bulk into elasticsearch.

streamData
  .window1(bulkRead and update/join)
  .processFunction(...)
  .window2(BulkPush)
  • You can use any suitable method for bulk-read like Storm Trident.
  • use BulkProcessor in window2 link
Amrit Jangid
  • 168
  • 1
  • 9
  • Thanks for the answer! This needs loading for EVERY window1 as opposed to having an internal cache and just loading the keys that are not already present - but you are right in that I probably will have to load, update and save everything in in the processFunction - it just seems very manual and non-performant wrt database access and bulk-reading/writing. Thanks for the hint though! – Peter Neubauer May 29 '19 at 13:09