Elasticsearch pyspark connection in insecure mode

Question

My end goal is to insert data from hdfs to elasticsearch but the issue i am facing is the connectivity

I am able to connect to my elasticsearch node using below curl command

curl -u username -X GET https://xx.xxx.xx.xxx:9200/_cat/indices?v' --insecure

but when it comes to connection with spark I am unable to do so. My command to insert data is df.write.mode("append").format('org.elasticsearch.spark.sql').option("es.net.http.auth.user", "username").option("es.net.http.auth.pass", "password").option("es.index.auto.create","true").option('es.nodes', 'https://xx.xxx.xx.xxx').option('es.port','9200').save('my-index/my-doctype')

Error i am getting is

org.elastisearch.hadoop.EsHadoopIllegalArgumentException:Cannot detect ES version - typical this happens if then network/Elasticsearch cluster is not accessible or when targetting a Wan/Cloud instance without the proper setting 'es.nodes.wan.only'
....
....
Caused by: org.elasticseach.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proy settings)- all nodes failed; tried [[xx.xxx.xx.xxx:9200]]
....
...

Here, What would be the pyspark equivalent of curl --insecure

Thanks

score 1 · Accepted Answer · answered Aug 18 '20 at 13:08

After many attempt and different config options. I found a way how to connect elastisearch running on https insecurely

        dfToEs.write.mode("append").format('org.elasticsearch.spark.sql') \
        .option("es.net.http.auth.user", username) \
        .option("es.net.http.auth.pass", password) \
        .option("es.net.ssl", "true") \
        .option("es.net.ssl.cert.allow.self.signed", "true") \
        .option("mergeSchema", "true") \
        .option('es.index.auto.create', 'true') \
        .option('es.nodes', 'https://{}'.format(es_ip)) \
        .option('es.port', '9200') \
        .option('es.batch.write.retry.wait', '100s') \
        .save('{index}/_doc'.format(index=index))

with the

(es.net.ssl, true)

We also have to provide self signed certificate like below

(es.net.ssl.cert.allow.self.signed, true)

Carlos Gomez · Answer 2 · 2021-05-04T13:47:11.137

I did check a lot of things and finally i can write in AWS ElasticSearch service (ES), but with scala/spark.

In a VPC, create security groups to access from EMR to ES with port 443 (inbound rules in ES to SG of EMR and inbound rules in EMR to same port)
Check connectivity from EMR master node, with a telnet command

    telnet xyz.eu-west-1.es.amazonaws.com 443

Once check above, check app level with curl command

curl https://xyz.eu-west-1.es.amazonaws.com:443/domainname/_search?pretty=true&?q=*```

After, goes to the code, in my case i did test with spark-shell, but server confs was included in start like this:

 spark-shell --jars elasticsearch-spark-20_2.11-7.1.1.jar --conf spark.es.nodes="xyz.eu-west-1.es.amazonaws.com" --conf spark.es.port=443 --conf spark.es.nodes.wan.only=true --conf spark.es.nodes.discovery="false" --conf spark.es.index.auto.create="true" --conf spark.es.resource="domain/doc" --conf spark.es.scheme="https"

Finally the code to write:

import java.util.Date
import org.apache.spark.sql.{DataFrame, Dataset, SparkSession}
import org.elasticsearch.spark._
import org.elasticsearch.spark.sql._
val dateformat =  new java.text.SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss")
val currentdate = dateformat.format(new Date)
val colorsDF = spark.read.json("multilinecolors.json")
val mcolors = colorsDF.withColumn("Date",lit(currentdate))
mcolors.write.mode("append").format("org.elasticsearch.spark.sql").option("es.net.http.auth.user", "").option("es.net.http.auth.pass", "").option("es.net.ssl", "true").option("es.net.ssl.cert.allow.self.signed", "true").option("mergeSchema", "true").option("es.index.auto.create", "true").option("es.nodes","https://xyz.eu-west-1.es.amazonaws.com").option("es.port", "443").option("es.batch.write.retry.wait", "100").save("domainname/_doc")```

` .option("spark.es.scheme", "https")` solved it for me – Mustafa Qamaruddin Jul 13 '22 at 20:20 — Mustafa Qamaruddin, Jul 13 '22 at 20:20

sathya · Answer 3 · 2020-08-10T13:18:03.110

can you try with the below sparkConfs,

val sparkConf = new SparkConf()
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.es.index.auto.create", "true")
.set("spark.es.nodes", "yourESaddress")
.set("spark.es.port", "9200")
.set("spark.es.net.http.auth.user","")
.set("spark.es.net.http.auth.pass", "")
.set("spark.es.resource", indexName)
.set("spark.es.nodes.wan.only", "true")

still you face the problem then, es.net.ssl = true and see.

If still you get the error try adding the below configs,

'es.resource' = 'ctrl_rater_resumen_lla/hb',
'es.nodes' = 'localhost',
'es.port' = '9200',
'es.index.auto.create' = 'true',
'es.index.read.missing.as.empty' = 'true',
'es.nodes.discovery'='true',
'es.net.ssl'='false'
'es.nodes.client.only'='false',
'es.nodes.wan.only' = 'true'
'es.net.http.auth.user'='xxxxx',
'es.net.http.auth.pass' = 'xxxxx'
'es.nodes.discovery' = 'false'

Elasticsearch pyspark connection in insecure mode

3 Answers3