Questions tagged [elasticsearch-spark]
37 questions
1
vote
0 answers
Unable to AWS Elasticsearch node-to-node Spark
I have an Elasticsearch Service on AWS I would like to access from Spark using elasticsearch-spark using a node-to-node configuration, so Spark workers can connect to elasticsearch nodes parallelly. However, Amazon only provides one endpoint to…

ami232
- 55
- 8
1
vote
0 answers
Elasticsearch-Spark has a dependency clash with Play JSON. Error message inside
I am working on a Zeppelin 0.7.1 and I need to use booth the elasticsearch-spark dependency and the play-json dependency. However these two are for some reason incompatible with each other. If I remove the json dependency, es-spark works fine. The…

Mnemosyne
- 1,162
- 4
- 13
- 45
1
vote
1 answer
Spark Structured streaming ForeachWriter unable to get sparkContext
I'm reading JSON data from Kafka queue usingSpark structured streaming but I need to write the JSON data into Elasticsearch.
However, I can't get sparkContext inside the ForeachWriter to convert the JSON to RDD. It throws NPE.
How can I get…

Philip K. Adetiloye
- 3,102
- 4
- 37
- 63
1
vote
1 answer
Apache Spark Java API + Twitter4j + exception while saving Twitter stream to Elasticsearch
I am trying to set up a Twitter stream using Apache Spark Java API. While saving the Twitter stream to Elasticsearch, I am getting an exception. I think I am trying to save raw tweet that is why the problem is. Please let me know what can I try to…

Manali Gaikwad
- 61
- 1
- 1
- 8
0
votes
0 answers
Using elasticsearch-spark connector in Pyspark , unable to get DENSE_VECTOR field from Elasticsearch
I'm using Pyspark to query from Elasticsearch and then generate Json & Pickle files.
My Elasticsearch index sr-data-index has a field called word_embedding which is of type DENSE_VECTOR. Using elasticsearch-spark connector and able to query from…

Sowjanya R Bhat
- 1,128
- 10
- 19
0
votes
0 answers
How can I overcome "Position for field not found in row; typically this is caused by a mapping inconsistency" in pyspark?
I'm new to pyspark and elasticsearch. All I'm trying to do, is to read an index from opensearch (v7.10.2) and dump it as parquet to s3 using pyspark (v3.2.1), running on databricks.
I manage to load the schema successfully by the index mapping, like…

Kludge
- 2,653
- 4
- 20
- 42
0
votes
0 answers
How to save pyspark DataFrame to Elasticsearch (Running on Docker) using elastisearch-hadoop
I am trying to write a pyspark DataFrame to an Elasticsearch instance running on Docker. I am unable to successfully connect to the Elasticsearch instance using elasticsearch-hadoop. When I try to save the DataFrame, I get an error that…

mondal.alex
- 11
- 2
0
votes
1 answer
Writing data from spark to elasticsearch: Connection error
I'm writing 20 millions rows of data to Elasticsearch (Azure Cloud) using spark-es connector. After writing 13 millions successfully, I've got the error bellow :
Caused by: EsHadoopNoNodesLeftException: Connection error (check network and/or…

mham
- 145
- 4
- 18
0
votes
1 answer
Spark fails to read from Elasticsearch/Opensearch. Invalid map received dynamic_date_formats
Hi I'm trying using scala 2.11.12, spark 2.3.0 and elasticsearch-spark-20 7.7.0 to read from an OpenSearch 1.3.4 Index with the following code:
spark.read.format("org.elasticsearch.spark.sql")
.load("myIndex")
.filter('Timestamp ===…

Tiz
- 413
- 1
- 5
- 21
0
votes
0 answers
Create documents that not exist, skip others
I'm working in a concurrent environment when index being built by Spark job may receive updates for same document id from the job itself and other sources. It is assumed that updates from other sources are more fresh and Spark job needs to silently…

Etki
- 2,042
- 2
- 17
- 40
0
votes
1 answer
Spark Structured Streaming from Kafka to Elastic Search
I want to write a Spark Streaming Job from Kafka to Elasticsearch. Here I want to detect the schema dynamically while reading it from Kafka.
Can you help me to do that.?
I know, this can be done in Spark Batch Processing via below line.
val schema =…

Siva Samraj
- 37
- 1
- 5
0
votes
1 answer
How to write dataframe with struct column into Elasticsearch via PySpark
I'm trying to write a dataframe containing struct column into Elasticsearch:
df1 = spark.createDataFrame([{"date": "2020.04.10","approach": "test", "outlier_score": 1, "a":"1","b":2},
{"date": "2020.04.10","approach": "test",…

Andrey Sapegin
- 454
- 8
- 33
0
votes
1 answer
Spark-elasticsearch fetch filtered records from elasticsearch using spark
I have a map which is as follows :
Map("index1" -> List["a", "b", "c"])
My data on elastic has a field called "names". I want to query elastic search from spark and return all records which has "a", "b", "c" as the value of the "name" field.
I…

mythic
- 535
- 7
- 21
0
votes
1 answer
NoSuchMethodError occurring in EsSparkSQL$.saveToES method
Exception in thread "main" java.lang.NoSuchMethodError:
org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs
The above error occurring when inserting the document from Spark application to the Elasticsearch cluster.
val conf = new SparkConf()
…

Manikandan Muthuraj
- 21
- 3
0
votes
1 answer
How to create an index in Elasticsearch using elasticsearch-spark?
I want to create an index in Elasticsearch from my spark transformation. I wonder what is the best method to do it using the elasticsearch-spark library ?
Kind regards

Clyde Barrow
- 1,924
- 8
- 30
- 60