Questions tagged [elasticsearch-hadoop]

Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Elasticsearch real-time search and analytics natively integrated with Hadoop.

Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Requirements

Elasticsearch (0.9X series or 1.0.0 or higher (highly recommended)) cluster accessible through REST. That's it! Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded and put to use without any dependencies. Simply make it available to your job classpath and you're set. For a certain library, see the dedicated chapter.

Documentation

109 questions
1
vote
2 answers

Spark Runtime Error - ClassDefNotFound: SparkConf

After installing and building Apache Spark (albeit with quite a few warnings), the compilation of our Spark application (using "sbt package") completes successfully. However, when trying to run our application using the spark-submit script, a…
1
vote
0 answers

Jackson error in ElasticSearch Hadoop while loading data to ElasticSearch

I am trying to load data from HDFS to ElasticSearch using elasticsearch-hadoop version elasticsearch-hadoop-2.1.0.Beta3.jar. There was a bug on Mapr: https://github.com/elastic/elasticsearch-hadoop/issues/215 which was supposed to fix the jackson…
Sibish
  • 908
  • 1
  • 13
  • 20
1
vote
1 answer

Elasticsearch-Hadoop get Non-indexed data

I have an elasticsearch cluster which has big amount of data. I want to extract all data from elasticsearch into Hadoop(Hive). I used Elasticsearch-Hadoop driver in order to extract data from elasticsearch by using Hive external table but it is too…
0
votes
0 answers

ElasticSearchHadoop throwing unauthorised exception

We are upgrading from Elastic 6.3 to 7.8 version. We are using- elastic Hadoop to upload the data in elastic index using scala spark. We are getting unauthorized exception while uploading the data. The same code working fine with 6.3 version. The…
0
votes
0 answers

An error occurred when using hive to query the es

I created an Hive external table to query the existing data of es like below CREATE EXTERNAL TABLE ods_es_data_inc (`agent_id` STRING, `dt_server_time` TIMESTAMP ) COMMENT 'bb_i_app' STORED BY…
Sam
  • 1
  • 1
0
votes
0 answers

How to save pyspark DataFrame to Elasticsearch (Running on Docker) using elastisearch-hadoop

I am trying to write a pyspark DataFrame to an Elasticsearch instance running on Docker. I am unable to successfully connect to the Elasticsearch instance using elasticsearch-hadoop. When I try to save the DataFrame, I get an error that…
0
votes
0 answers

elasticsearch hadoop cannot parse value [] for field

There is a double field in the index I use that is empty. When I use elasticsearch-spark-30_ 2.12-7.17.2.jar reading the index, the exception EsHadoopParsingException: Cannot parse value [] for field [X] will be thrown, but when I replace the…
lucien
  • 1
  • 1
0
votes
1 answer

ElasticSearch hive SerializationError handler

Using Elastic search version 6.8.0 hive> select * from…
Syed Rafi
  • 825
  • 2
  • 12
  • 35
0
votes
1 answer

Hive to Elastic search ingestion issues

Using Elastic search version 6.8.0 Complete Hive Job gets failed for a single malformed json record, I tried changing the 'es.write.rest.error.handler.es.return.default'='PASS/HANDLED' But no luck Refer :…
Syed Rafi
  • 825
  • 2
  • 12
  • 35
0
votes
1 answer

Reading an Elasticsearch Index from PySpark

Could anyone tell me why this test script for PySpark errors out? (python 3.6.8, hadoop 3.3.1, spark 3.2.1, elasticsearch-hadoop 7.14) from pyspark.sql import SparkSession, SQLContext myspark = SparkSession.builder \ .appName("My test.") \ …
0
votes
1 answer

EsHadoopIllegalArgumentException: invalid map received dynamic=strict errors on elasticsearch-hadoop

trying with both the dataframe Api and the rdd API val map =collection.mutable.Map[String, String]() map("es.nodes.wan.only") = "true" map("es.port") = "reducted" map("es.net.http.auth.user") = "reducted" map("es.net.http.auth.pass") =…
alonisser
  • 11,542
  • 21
  • 85
  • 139
0
votes
1 answer

Spark 3.0 scala.None$ is not a valid external type for schema of string

While using elasticsearch-hadoop library for reading elasticsearch index with empty attribute, getting the exception Caused by: java.lang.RuntimeException: scala.None$ is not a valid external type for schema of string There is open defect in github…
Shivaji Mutkule
  • 1,020
  • 1
  • 15
  • 28
0
votes
1 answer

Invalid timestamp when reading Elasticsearch records with Spark

I'm getting invalid timestamp when reading Elasticsearch records using Spark with elasticsearch-hadoop library. I'm using following Spark code for records reading: val sc = spark.sqlContext val elasticFields = Seq( "start_time", "action", …
Jacfal
  • 15
  • 6
0
votes
3 answers

Elasticsearch pyspark connection in insecure mode

My end goal is to insert data from hdfs to elasticsearch but the issue i am facing is the connectivity I am able to connect to my elasticsearch node using below curl command curl -u username -X GET https://xx.xxx.xx.xxx:9200/_cat/indices?v'…
0
votes
1 answer

How to write dataframe with struct column into Elasticsearch via PySpark

I'm trying to write a dataframe containing struct column into Elasticsearch: df1 = spark.createDataFrame([{"date": "2020.04.10","approach": "test", "outlier_score": 1, "a":"1","b":2}, {"date": "2020.04.10","approach": "test",…