How To Be Sure All Documents Written To Elasticsearch Integration Using Elasticsearch-Hadoop Connector In Spark Streaming

Question

I am writing DStream to Elasticsearch using Elasticsearch-Hadoop connector. It's the link you can find the connector https://www.elastic.co/guide/en/elasticsearch/hadoop/5.6/spark.html

I need to process the window, write all the documents to ES using "JavaEsSpark.saveToEs" method and want to be sure all the documents written and commit offsets to Kafka. Since JavaEsSpark.saveToEs insert documents as in batch mode, I cannot keep the track of my documents.

My basic code is below. Is there any opinion?

    dstream.foreachRDD((items, time) -> {
        JavaEsSpark.saveToEs(items,"myindex/mytype");
        //wait until all the documents written
        //do somehing else then return (actually the job is committing kafka offsets)
});

Luc E · Answer 1 · 2020-03-23T14:16:58.480

0

You can encapsulate your function in a Try (this is a Scala exemple) :

 Try {
  rdd.saveToEs(AppSettings.Elastic.Resource, configuration)
} match {
  case Failure(f) =>
    logger.error(s"SaveToEs failed: $f") //or whatever you want
  case _ =>
}

edited Mar 23 '20 at 14:16

answered Mar 23 '20 at 10:00

Luc E

1,204
8
16

How To Be Sure All Documents Written To Elasticsearch Integration Using Elasticsearch-Hadoop Connector In Spark Streaming

1 Answers1