I'm not too familiar with Kafka but I would like to know what's the best way to read data in batches from Kafka so I can use Elasticsearch Bulk Api to load the data faster and reliably.
Btw, am using Vertx for my Kafka consumer
Thank you,
I'm not too familiar with Kafka but I would like to know what's the best way to read data in batches from Kafka so I can use Elasticsearch Bulk Api to load the data faster and reliably.
Btw, am using Vertx for my Kafka consumer
Thank you,
I cannot tell if this is the best approach or not, but when I started looking for similar functionality I could not find any readily available frameworks. I found this project:
https://github.com/reachkrishnaraj/kafka-elasticsearch-standalone-consumer/tree/branch2.0
and started contributing to it as it was not doing everything I wanted, and was also not easily scalable. Now the 2.0 version is quite reliable and we use it in production in our company processing/indexing 300M+ events per day.
This is not a self-promotion :) - just sharing how we do the same type of work. There might be other options right now as well, of course.
https://github.com/confluentinc/kafka-connect-elasticsearch
Or You can try this source
https://github.com/reachkrishnaraj/kafka-elasticsearch-standalone-consumer
Running as a standard Jar
**1. Download the code into a $INDEXER_HOME dir.
**2. cp $INDEXER_HOME/src/main/resources/kafka-es-indexer.properties.template /your/absolute/path/kafka-es-indexer.properties file - update all relevant properties as explained in the comments
**3. cp $INDEXER_HOME/src/main/resources/logback.xml.template /your/absolute/path/logback.xml
specify directory you want to store logs in:
adjust values of max sizes and number of log files as needed
**4. build/create the app jar (make sure you have MAven installed):
cd $INDEXER_HOME
mvn clean package
The kafka-es-indexer-2.0.jar will be created in the $INDEXER_HOME/bin. All dependencies will be placed into $INDEXER_HOME/bin/lib. All JAR dependencies are linked via kafka-es-indexer-2.0.jar manifest.
**5. edit your $INDEXER_HOME/run_indexer.sh script: -- make it executable if needed (chmod a+x $INDEXER_HOME/run_indexer.sh) -- update properties marked with "CHANGE FOR YOUR ENV" comments - according to your environment
**6. run the app [use JDK1.8] :
./run_indexer.sh
I used spark streaming and the it was quite a simple implementation using Scala.