Reading data form Elasticsearch into Flink aggregation?

Question

I'm trying to update documents in Elasticsearch using Kafka messages (as a StreamSource). Writing to Elasticsearch in bulks using windows and the Elasticsearch connector as a sink is fine, however, we need to update existing data in the documents and read that in a bulk-performant manner (not for every tuple but for e.g. the whole window after a byKey() split that we want to aggregate over)

We are using Storm Trident right now which is performing bulk reads before a persistentAggregate and writes the updated aggregations back after, minimizing interaction with the backend. I just can't find something similar in Flink - any hints?

Amrit Jangid · Accepted Answer · 2019-06-25T09:10:32.460

0

How about running two window call on stream -

window1 - To bulk read from elasticsearch

window2 - To bulk into elasticsearch.

streamData
  .window1(bulkRead and update/join)
  .processFunction(...)
  .window2(BulkPush)

You can use any suitable method for bulk-read like Storm Trident.
use BulkProcessor in window2 link

edited Jun 25 '19 at 09:10

answered May 29 '19 at 05:14

Amrit Jangid

168
1
9

Thanks for the answer! This needs loading for EVERY window1 as opposed to having an internal cache and just loading the keys that are not already present - but you are right in that I probably will have to load, update and save everything in in the processFunction - it just seems very manual and non-performant wrt database access and bulk-reading/writing. Thanks for the hint though! – Peter Neubauer May 29 '19 at 13:09

Reading data form Elasticsearch into Flink aggregation?

1 Answers1