0

I am writing DStream to Elasticsearch using Elasticsearch-Hadoop connector. It's the link you can find the connector https://www.elastic.co/guide/en/elasticsearch/hadoop/5.6/spark.html

I need to process the window, write all the documents to ES using "JavaEsSpark.saveToEs" method and want to be sure all the documents written and commit offsets to Kafka. Since JavaEsSpark.saveToEs insert documents as in batch mode, I cannot keep the track of my documents.

My basic code is below. Is there any opinion?

    dstream.foreachRDD((items, time) -> {
        JavaEsSpark.saveToEs(items,"myindex/mytype");
        //wait until all the documents written
        //do somehing else then return (actually the job is committing kafka offsets)
});
Yılmaz
  • 185
  • 2
  • 14

1 Answers1

0

You can encapsulate your function in a Try (this is a Scala exemple) :

 Try {
  rdd.saveToEs(AppSettings.Elastic.Resource, configuration)
} match {
  case Failure(f) =>
    logger.error(s"SaveToEs failed: $f") //or whatever you want
  case _ =>
}
Luc E
  • 1,204
  • 8
  • 16