How can I batch Kafka reads to Elasticsearch

Question

I'm not too familiar with Kafka but I would like to know what's the best way to read data in batches from Kafka so I can use Elasticsearch Bulk Api to load the data faster and reliably.

Btw, am using Vertx for my Kafka consumer

Thank you,

score 4 · Accepted Answer · answered Nov 14 '15 at 03:57

I cannot tell if this is the best approach or not, but when I started looking for similar functionality I could not find any readily available frameworks. I found this project:

https://github.com/reachkrishnaraj/kafka-elasticsearch-standalone-consumer/tree/branch2.0

and started contributing to it as it was not doing everything I wanted, and was also not easily scalable. Now the 2.0 version is quite reliable and we use it in production in our company processing/indexing 300M+ events per day.

This is not a self-promotion :) - just sharing how we do the same type of work. There might be other options right now as well, of course.

Note: the latest development is moved to this Repo now, which has Docker and Gradle support: https://github.com/BigDataDevs/kafka-elasticsearch-consumer — Marina, Mar 22 '16 at 17:07

score 1 · Answer 2 · answered Apr 19 '17 at 10:56

https://github.com/confluentinc/kafka-connect-elasticsearch

Or You can try this source

https://github.com/reachkrishnaraj/kafka-elasticsearch-standalone-consumer

Running as a standard Jar

**1. Download the code into a $INDEXER_HOME dir.

**2. cp $INDEXER_HOME/src/main/resources/kafka-es-indexer.properties.template /your/absolute/path/kafka-es-indexer.properties file - update all relevant properties as explained in the comments

**3. cp $INDEXER_HOME/src/main/resources/logback.xml.template /your/absolute/path/logback.xml

specify directory you want to store logs in:

adjust values of max sizes and number of log files as needed

**4. build/create the app jar (make sure you have MAven installed):

cd $INDEXER_HOME
mvn clean package

The kafka-es-indexer-2.0.jar will be created in the $INDEXER_HOME/bin. All dependencies will be placed into $INDEXER_HOME/bin/lib. All JAR dependencies are linked via kafka-es-indexer-2.0.jar manifest.

**5. edit your $INDEXER_HOME/run_indexer.sh script: -- make it executable if needed (chmod a+x $INDEXER_HOME/run_indexer.sh) -- update properties marked with "CHANGE FOR YOUR ENV" comments - according to your environment

**6. run the app [use JDK1.8] :

./run_indexer.sh

score 0 · Answer 3 · answered Mar 18 '16 at 13:35

0

I used spark streaming and the it was quite a simple implementation using Scala.

answered Mar 18 '16 at 13:35

Philip K. Adetiloye

3,102
4
37
63

How can I batch Kafka reads to Elasticsearch

3 Answers3