2

I'm not too familiar with Kafka but I would like to know what's the best way to read data in batches from Kafka so I can use Elasticsearch Bulk Api to load the data faster and reliably.

Btw, am using Vertx for my Kafka consumer

Thank you,

Philip K. Adetiloye
  • 3,102
  • 4
  • 37
  • 63

3 Answers3

4

I cannot tell if this is the best approach or not, but when I started looking for similar functionality I could not find any readily available frameworks. I found this project:

https://github.com/reachkrishnaraj/kafka-elasticsearch-standalone-consumer/tree/branch2.0

and started contributing to it as it was not doing everything I wanted, and was also not easily scalable. Now the 2.0 version is quite reliable and we use it in production in our company processing/indexing 300M+ events per day.

This is not a self-promotion :) - just sharing how we do the same type of work. There might be other options right now as well, of course.

Marina
  • 3,894
  • 9
  • 34
  • 41
  • Note: the latest development is moved to this Repo now, which has Docker and Gradle support: https://github.com/BigDataDevs/kafka-elasticsearch-consumer – Marina Mar 22 '16 at 17:07
1

https://github.com/confluentinc/kafka-connect-elasticsearch

Or You can try this source

https://github.com/reachkrishnaraj/kafka-elasticsearch-standalone-consumer

Running as a standard Jar

**1. Download the code into a $INDEXER_HOME dir.

**2. cp $INDEXER_HOME/src/main/resources/kafka-es-indexer.properties.template /your/absolute/path/kafka-es-indexer.properties file - update all relevant properties as explained in the comments

**3. cp $INDEXER_HOME/src/main/resources/logback.xml.template /your/absolute/path/logback.xml

specify directory you want to store logs in:

adjust values of max sizes and number of log files as needed

**4. build/create the app jar (make sure you have MAven installed):

cd $INDEXER_HOME
mvn clean package

The kafka-es-indexer-2.0.jar will be created in the $INDEXER_HOME/bin. All dependencies will be placed into $INDEXER_HOME/bin/lib. All JAR dependencies are linked via kafka-es-indexer-2.0.jar manifest.

**5. edit your $INDEXER_HOME/run_indexer.sh script: -- make it executable if needed (chmod a+x $INDEXER_HOME/run_indexer.sh) -- update properties marked with "CHANGE FOR YOUR ENV" comments - according to your environment

**6. run the app [use JDK1.8] :

./run_indexer.sh
wcc526
  • 3,915
  • 2
  • 31
  • 29
0

I used spark streaming and the it was quite a simple implementation using Scala.

Philip K. Adetiloye
  • 3,102
  • 4
  • 37
  • 63