Highest Voted 'elasticsearch-spark' Questions

3

votes

1 answer

Incompatible OpenSearch 1.3 connector for Spark 3.x

We used to have Spark 2.4.4, Scala 2.11 and Elastic Search 6.8 in our servers. Our servers were recently upgraded and Spark was upgraded to 3.1.2 and Scala to 2.12. We were getting the below error when writing records to Elastic Search. So we…

asked Nov 14 '22 at 09:46

Purna Mahesh

93
1
7

3

votes

1 answer

SparkContext: Error initializing SparkContext While Running Spark Job

I'm doing a Spark program that loads data from Elastic Search to HDFS but I am getting Error initializing SparkContext. error while running the job. The error is during making spark session. Hadoop: 3.2.1 Spark: 2.4.4 Elasticsearch Spark (for Spark…

apache-spark elasticsearch hadoop pyspark elasticsearch-spark

asked Oct 22 '20 at 13:20

Uttam Sapkota

69
6

3

votes

1 answer

org.elasticsearch.hadoop.rest.EsHadoopRemoteException: search_context_missing_exception: No search context found for id

Spark tasks are failing because of "No search context found for id". I tried a couple of options like spark.es.input.max.docs.per.partition 250 spark.es.scroll.size 100 spark.es.batch.size.bytes 32mb But tasks are still failing. we are using: …

scala apache-spark elasticsearch elasticsearch-spark

asked Jan 06 '19 at 15:25

Sky

2,509
1
19
28

3

votes

2 answers

Elastic search could not write all entries: May be es was overloaded

I have an application where I read csv files and do some transformations and then push them to elastic search from spark itself. Like this input.write.format("org.elasticsearch.spark.sql") .mode(SaveMode.Append) …

apache-spark elasticsearch apache-spark-sql elasticsearch-spark

asked Mar 20 '18 at 06:36

hard coder

5,449
6
36
61

3

votes

1 answer

How to convert types when reading data from Elasticsearch using elasticsearch-spark in SPARK

When i try to read data from elasticsearch using the esRDD("index") function in elasticsearch-spark, i get the results in type org.apache.spark.rdd.RDD[(String, scala.collection.Map[String,AnyRef])]. And when i check the values, they are all type…

scala apache-spark elasticsearch elasticsearch-spark

asked Jul 06 '17 at 07:56

PC9527

308
1
2
11

2

votes

1 answer

Elasticsearch spark reading slow

Reading from Elasticsearch v6.2 into spark using the prescribed spark connector org.elasticsearch:elasticsearch-spark-20_2.11:6.3.2 is horrendously slow. This is from a 3 node ES cluster with index: curl https://server/_cat/indices?v green open …

scala apache-spark elasticsearch elasticsearch-spark

asked Aug 15 '18 at 09:29

ixaxaar

6,411
3
24
33

2

votes

0 answers

Getting an error while writing to Elastic search from spark with custom mapping id

I'm trying to write a dataframe from spark to Elastic with a custom mapping id. and when I do that I'm getting the below error. org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 14.0 failed 16 times, most recent…

apache-spark elasticsearch elasticsearch-5 elasticsearch-spark

asked May 08 '18 at 19:54

knowledge_seeker

362
3
20

1

vote

0 answers

org.elasticsearch.hadoop.rest.EsHadoopRemoteException mapper_parsing_exception: failed to parse field with date

I am migrating existing applciation which using elasticsearch-spark 7.6.0 version to latest i.e elasticsearch-spark-30_2.12:7.15.0. I am loading ES data with date mapping as below "my_partition_key": { "format": "yyyy-MM-dd", …

apache-spark elasticsearch elasticsearch-spark

asked Nov 02 '21 at 22:06

Naresh G

83
5

1

vote

1 answer

Elasticsearch with Spark, dynamic index creation based on dataframe column

I have a spark dataframe which has a column say "name". The name could have different values in a single dataframe. When I write my data to elasticsearch using spark (scala), I want to write the data to different indexes based on the value of the…

apache-spark elasticsearch apache-spark-sql elasticsearch-spark

asked Jan 11 '20 at 05:51

mythic

535
7
21

1

vote

0 answers

How to filter PySpark SQL dataframe read from Elasticsearch by metadata field (by _id for example)?

I am reading PySpark SQL Dataframe from Elasticsearch index, with the read option of es.read.metadata=True. I want to filter the data by condition on metadata field, but get an empty result, although there should be result. Is it possible to get the…

pyspark apache-spark-sql elasticsearch-spark

asked Jun 05 '19 at 09:17

David206

85
1
5

1

vote

0 answers

Casting an incorrectly detected schema. Pyspark-Elasticsearch

I am reading in geo point data from ElasticSearch index using Pyspark. I am creating my DataFrame using the following command. us_df = spark.read.format('es').option('es.query', us_q).option('es.read.field.as.array.include',…

apache-spark elasticsearch pyspark elasticsearch-spark

asked Mar 12 '18 at 19:02

Pramod Sripada

241
1
5
16

1

vote

1 answer

Spark Group By and with Rank function is running very slow

I am writing a spark app for finding top n accessed URLs within a time frame. But This job keeps running and takes hours for 389451 records in ES for an instance. I want to reduce this time. I am reading from Elastic search in spark as bellow val…

scala apache-spark apache-spark-sql elasticsearch-spark

asked Jan 05 '18 at 06:21

hard coder

5,449
6
36
61

1

vote

1 answer

Write to elasticsearch from spark is very slow

I am processing a text file and writing transformed rows from a Spark application to elastic search as bellow input.write.format("org.elasticsearch.spark.sql") .mode(SaveMode.Append) .option("es.resource", "{date}/" + dir).save() This…

apache-spark elasticsearch elasticsearch-5 elasticsearch-spark

asked Nov 23 '17 at 10:28

hard coder

5,449
6
36
61

1

vote

1 answer

Spark + Elastic search write performance issue

Seeing low # of writes to elasticsearch using spark java. Here are the Configurations using 13.xlarge machines for ES cluster 4 instances each have 4 processors. Set refresh interval to -1 and replications to '0' and other basic configurations…

apache-spark elasticsearch elasticsearch-hadoop elasticsearch-spark

asked Oct 18 '17 at 15:00

camelBeginner

21
8

1

vote

0 answers

Making ES ForEachWriter sink idempotent with structured streaming in spark

I am experiencing the same situation as described in Spark structured steaming from kafka - last message processed again after resume from checkpoint. When I restart my spark job after a failure the last message gets processed again. One of the…

apache-spark spark-structured-streaming elasticsearch-spark

asked Jul 20 '17 at 03:09

fledgling

991
4
25
48

Questions tagged [elasticsearch-spark]