Correct way to store offsets in Kafka when using Spark and Elastic Search

Question

I have done a lot of research on this, but I am still not able to get something suitable. Everywhere, I go, I see that the easiest way is to call saveToEs() and then commit offsets after that. My question is what if saveToEs() fails for some reason?

What is the correct way to store offsets in Kafka when we're using Spark streaming job and storing our documents in ES. I tried using BulkProcessorListener and stored offsets manually (keeping track of sorted offsets and requests and what not), but it got out of hand and the approach seemed to complicated for such a general task.

Can someone guide me?

Anyone interested in my approach, here is the question that explains it Commit Offsets to Kafka on Spark Executors

You're best off writing your data back to Kafka from Spark, and then you can use Kafka Connect to stream it to Elasticsearch. That's what Kafka Connect is designed to do. If that would be of interest then I can write an answer explaining how. — Robin Moffatt, Nov 05 '19 at 14:51
@RobinMoffatt isn't there anyway I can do using Spark in the middle? I am filtering and enriching my events (this is what Spark is doing) and then storing it into ES — alina, Nov 05 '19 at 16:03
you can, but it's not always the best approach. do your filtering and enriching with Spark, and then use Kafka Connect to reliably stream the processed data to Elasticsearch. Each tool does what it's good at. — Robin Moffatt, Nov 05 '19 at 16:06
@RobinMoffatt you're right, but right now we don't have the option of changing the tool. However, I would still like to know how we can do that, maybe, in future we can use this — alina, Nov 05 '19 at 16:07
Its hard to handle failure scenario and always be challenge unit transaction...which something you need to handle in your code... the best approach what mentioned by @RobinMoffatt ...use ES connector..its very reliable — Nitin, Nov 05 '19 at 17:38
@sun_007 ES Connector? You mean use Kafka Connector that writes directly to ES? and let Kafka and ES worry about handling failures and committing offsets? Does this connector provide this functionality out of the box or does this also requires some hacky way? — alina, Nov 05 '19 at 20:33
Yes Connector is out of the box solution , is part of confluent but open source...try and post question if face any issue — Nitin, Nov 05 '19 at 20:39
@sun_007 yes! I will look into this. I was really hoping to get a Spark solution because it's a bit difficult to re-architecture at this point, but thanks for pointing me in this direction — alina, Nov 05 '19 at 20:40

Correct way to store offsets in Kafka when using Spark and Elastic Search

0 Answers0