I am trying to integrate Spark-EMR with Amazon OpenSearch by trying to read a OpenSearch document into a Spark Dataframe and using the below mentioned simple code snippet.
from pyspark.sql import *
spark = (SparkSession.builder
.config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer')
.getOrCreate())
reader = (spark.read.format("org.elasticsearch.spark.sql")
.option("es.nodes.wan.only", "true")
.option("es.port", "443")
.option("es.net.ssl", "true")
.option("es.nodes", "https://*******.ap-south-1.es.amazonaws.com")
.load("customer_id/_doc"))
reader.show(False)
I have looked into almost all similar asked questions in stackoverflow, also referred a number of articles and official documentation as well but no luck yet. Also, I have made sure that all security groups are taken care of, updated the AWS OpenSearch Access Policy as well but nothing seems to be working.
It looks to me that the code is straight forward enough but I am missing something silly. Any help here will be highly appreciated as it is acting as a blocker to me.
Spark Version: 2.4.0
,
Elastic Search Version: 7.10.2
,
External packages/Jars used: org.elasticsearch:elasticsearch-hadoop:7.10.2