Questions tagged [elasticsearch-spark]
37 questions
3
votes
1 answer
Incompatible OpenSearch 1.3 connector for Spark 3.x
We used to have Spark 2.4.4, Scala 2.11 and Elastic Search 6.8 in our servers.
Our servers were recently upgraded and Spark was upgraded to 3.1.2 and Scala to 2.12. We were getting the below error when writing records to Elastic Search. So we…

Purna Mahesh
- 93
- 1
- 7
3
votes
1 answer
SparkContext: Error initializing SparkContext While Running Spark Job
I'm doing a Spark program that loads data from Elastic Search to HDFS but I am getting Error initializing SparkContext. error while running the job. The error is during making spark session.
Hadoop: 3.2.1
Spark: 2.4.4
Elasticsearch Spark (for Spark…

Uttam Sapkota
- 69
- 6
3
votes
1 answer
org.elasticsearch.hadoop.rest.EsHadoopRemoteException: search_context_missing_exception: No search context found for id
Spark tasks are failing because of "No search context found for id". I tried a couple of options like
spark.es.input.max.docs.per.partition 250
spark.es.scroll.size 100
spark.es.batch.size.bytes 32mb
But tasks are still failing. we are using:
…

Sky
- 2,509
- 1
- 19
- 28
3
votes
2 answers
Elastic search could not write all entries: May be es was overloaded
I have an application where I read csv files and do some transformations and then push them to elastic search from spark itself. Like this
input.write.format("org.elasticsearch.spark.sql")
.mode(SaveMode.Append)
…

hard coder
- 5,449
- 6
- 36
- 61
3
votes
1 answer
How to convert types when reading data from Elasticsearch using elasticsearch-spark in SPARK
When i try to read data from elasticsearch using the esRDD("index") function in elasticsearch-spark, i get the results in type org.apache.spark.rdd.RDD[(String, scala.collection.Map[String,AnyRef])]. And when i check the values, they are all type…

PC9527
- 308
- 1
- 2
- 11
2
votes
1 answer
Elasticsearch spark reading slow
Reading from Elasticsearch v6.2 into spark using the prescribed spark connector org.elasticsearch:elasticsearch-spark-20_2.11:6.3.2 is horrendously slow. This is from a 3 node ES cluster with index:
curl https://server/_cat/indices?v
green open …

ixaxaar
- 6,411
- 3
- 24
- 33
2
votes
0 answers
Getting an error while writing to Elastic search from spark with custom mapping id
I'm trying to write a dataframe from spark to Elastic with a custom mapping id. and when I do that I'm getting the below error.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 14.0 failed 16 times, most recent…

knowledge_seeker
- 362
- 3
- 20
1
vote
0 answers
org.elasticsearch.hadoop.rest.EsHadoopRemoteException mapper_parsing_exception: failed to parse field with date
I am migrating existing applciation which using elasticsearch-spark 7.6.0 version to latest i.e elasticsearch-spark-30_2.12:7.15.0. I am loading ES data with date mapping as below
"my_partition_key": {
"format": "yyyy-MM-dd",
…

Naresh G
- 83
- 5
1
vote
1 answer
Elasticsearch with Spark, dynamic index creation based on dataframe column
I have a spark dataframe which has a column say "name". The name could have different values in a single dataframe.
When I write my data to elasticsearch using spark (scala), I want to write the data to different indexes based on the value of the…

mythic
- 535
- 7
- 21
1
vote
0 answers
How to filter PySpark SQL dataframe read from Elasticsearch by metadata field (by _id for example)?
I am reading PySpark SQL Dataframe from Elasticsearch index, with the read option of es.read.metadata=True. I want to filter the data by condition on metadata field, but get an empty result, although there should be result. Is it possible to get the…

David206
- 85
- 1
- 5
1
vote
0 answers
Casting an incorrectly detected schema. Pyspark-Elasticsearch
I am reading in geo point data from ElasticSearch index using Pyspark. I am creating my DataFrame using the following command.
us_df = spark.read.format('es').option('es.query', us_q).option('es.read.field.as.array.include',…

Pramod Sripada
- 241
- 1
- 5
- 16
1
vote
1 answer
Spark Group By and with Rank function is running very slow
I am writing a spark app for finding top n accessed URLs within a time frame. But This job keeps running and takes hours for 389451 records in ES for an instance. I want to reduce this time.
I am reading from Elastic search in spark as bellow
val…

hard coder
- 5,449
- 6
- 36
- 61
1
vote
1 answer
Write to elasticsearch from spark is very slow
I am processing a text file and writing transformed rows from a Spark application to elastic search as bellow
input.write.format("org.elasticsearch.spark.sql")
.mode(SaveMode.Append)
.option("es.resource", "{date}/" + dir).save()
This…

hard coder
- 5,449
- 6
- 36
- 61
1
vote
1 answer
Spark + Elastic search write performance issue
Seeing low # of writes to elasticsearch using spark java.
Here are the Configurations
using 13.xlarge machines for ES cluster
4 instances each have 4 processors.
Set refresh interval to -1 and replications to '0' and other basic
configurations…

camelBeginner
- 21
- 8
1
vote
0 answers
Making ES ForEachWriter sink idempotent with structured streaming in spark
I am experiencing the same situation as described in Spark structured steaming from kafka - last message processed again after resume from checkpoint. When I restart my spark job after a failure the last message gets processed again. One of the…

fledgling
- 991
- 4
- 25
- 48