0

I'm writing 20 millions rows of data to Elasticsearch (Azure Cloud) using spark-es connector. After writing 13 millions successfully, I've got the error bellow :


    Caused by: EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed; tried [[.......westeurope.azure.elastic-cloud.com:9243]]

My code: Writing data from spark to Elastic:

data
          .write
          .format("org.elasticsearch.spark.sql")
          .option("es.nodes", node)
          .option("es.port", port)
          .option("es.net.http.auth.user", username)
          .option("es.net.http.auth.pass", password)
          .option("es.net.ssl", "true")
          .option("es.nodes.wan.only", "true")
          .option("es.mapping.id", "id")
          .mode(writingMode)
          .save(index)

Any help or suggestion would be appreciated !

mham
  • 145
  • 4
  • 18
  • Most likely your elastic cluster is not too happy with the amount of data your are sending his way. Could be many things, disk space ? Cpu ? ... You should check `Elasticsearch` logs – Paulo Oct 16 '22 at 18:36

1 Answers1

0

When you do spark-submit, try playing with the parameters: driver-memory executor-memory

For my setup, the following works. I do not know your system specifications, you can try experimenting with high values.

/spark/bin/spark-submit --driver-memory 4g --executor-memory 6g <jarname.jar>

The issue is highly likely to be with the amount of load you are putting on the system and not your spark and elasticsearch connection.