1

I am trying to integrate Spark-EMR with Amazon OpenSearch by trying to read a OpenSearch document into a Spark Dataframe and using the below mentioned simple code snippet.

from pyspark.sql import *

spark = (SparkSession.builder
         .config('spark.serializer', 'org.apache.spark.serializer.KryoSerializer')
         .getOrCreate())

reader = (spark.read.format("org.elasticsearch.spark.sql")
          .option("es.nodes.wan.only", "true")
          .option("es.port", "443")
          .option("es.net.ssl", "true")
          .option("es.nodes", "https://*******.ap-south-1.es.amazonaws.com")
          .load("customer_id/_doc"))

reader.show(False)

I have looked into almost all similar asked questions in stackoverflow, also referred a number of articles and official documentation as well but no luck yet. Also, I have made sure that all security groups are taken care of, updated the AWS OpenSearch Access Policy as well but nothing seems to be working.

It looks to me that the code is straight forward enough but I am missing something silly. Any help here will be highly appreciated as it is acting as a blocker to me.

Spark Version: 2.4.0, Elastic Search Version: 7.10.2, External packages/Jars used: org.elasticsearch:elasticsearch-hadoop:7.10.2

Sourav Das
  • 51
  • 7
  • 401 error means that you are not authenticated. In the code you posted I don't see where you added a user-name/password or API-Key for reading the information. – Briomkez Oct 23 '22 at 11:46
  • 1
    Currently I have disabled Fine-grain-access-policy in my new Open search Instance and I am not getting the 401 unauthorised error anymore. I am rather getting a new error: *null null* instead of *401 unauthorised* – Sourav Das Oct 26 '22 at 21:27

0 Answers0