Questions tagged [elasticsearch-hadoop]

Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Elasticsearch real-time search and analytics natively integrated with Hadoop.

Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Requirements

Elasticsearch (0.9X series or 1.0.0 or higher (highly recommended)) cluster accessible through REST. That's it! Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded and put to use without any dependencies. Simply make it available to your job classpath and you're set. For a certain library, see the dedicated chapter.

Documentation

109 questions
0
votes
1 answer

Scripted_upsert with Elasticsearch-hadoop impossible?

With the Elasticsearch-hadoop Connector, is it possible to use the scripted_upsert to true on an upsert insertion ? I am using the es.update.script.inline configuration, but i can't find any way to use the script_upsert to true and to empty the…
Luc E
  • 1,204
  • 8
  • 16
0
votes
0 answers

How to read 1M records from Elasticsearch into PySpark?

I have a problem with reading data from Elasticsearch into Spark cluster (I'm using Zeppelin environment, so all connection settings are configured in the Zeppelin interpreter settings). First, I have tried to read it with PySpark: %pyspark from…
0
votes
1 answer

EsHadoopIllegalArgumentException: Trouble connecting Hadoop to Elasticsearch

I'm using Databrics to run my Spark application, and I'm trying to use elasticsearch-hadoop to build a connection with Elasticsearch. After configuring a peering connection between my Databricks VPC and my Elasticsearch VPC, I can finally get the…
Adam
  • 482
  • 4
  • 15
0
votes
1 answer

Do Databricks workers and Elasticsearch nodes need to be in the same VPC in AWS?

I would like to write a dataframe into Elasticsearch from within Databricks. My Elasticsearch cluster is hosted on AWS and Databricks is spinning up EC2 instances with a certain role. That role has the permission to interact with my Elasticsearch…
0
votes
1 answer

How To Be Sure All Documents Written To Elasticsearch Integration Using Elasticsearch-Hadoop Connector In Spark Streaming

I am writing DStream to Elasticsearch using Elasticsearch-Hadoop connector. It's the link you can find the connector https://www.elastic.co/guide/en/elasticsearch/hadoop/5.6/spark.html I need to process the window, write all the documents to ES…
0
votes
1 answer

create new SparkSession for different query?

I'd like to get two data from elasticsearch One is filtered with a query, another has no filter. // with query session = get_spark_session(query=query) df = session.read.option( "es.resource", "analytics-prod-2019.08.02" …
eugene
  • 39,839
  • 68
  • 255
  • 489
0
votes
1 answer

Elasticsearch Spark, how to query multiple times?

I'm on jupyter notebook. I'd like to use query dsl to prepare initial Dataframe. I use conf.set("es.query", dsl_query) for that. (https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#_querying) But then, I want to…
eugene
  • 39,839
  • 68
  • 255
  • 489
0
votes
1 answer

How do I run a query against Elasticsearch using PySpark without querying every node?

My end goal is to use PySpark to efficiently index a large volume of data in Elasticsearch (ES), then run a huge number of queries against the index and record statistics over the results. Elasticsearch version 5.6.5 Spark version 2.4.0 Hadoop…
LaserJesus
  • 8,230
  • 7
  • 47
  • 65
0
votes
1 answer

Spark Scala - How to construct Scala Map from nested JSON?

I've a nested json data with nested fields that I want to extract and construct a Scala Map. Heres the sample JSON: "nested_field": [ { "airport": "sfo", "score": 1.0 }, { "airport": "phx", "score": 1.0 }, { "airport":…
user2727704
  • 625
  • 1
  • 10
  • 21
0
votes
0 answers

Elasticsearch bails out ES-HADOOP PLUGIN

we are using ES-HADOOP plugin to push data into Elasticsearch cluster from Hadoop HBASE table. below are the cluster details. elasticsearch version: 2.3.5 data nodes: 3 master nodes: 3 client node: 1 the data nodes are master nodes as…
0
votes
1 answer

How to setup Elasticsearch Structured Streaming with X-Pack enabled?

I'm trying to use Elasticsearch (ES) 6.1.1 Hadoop with installed x-pack to write data using Spark Structured Streaming 2.2.1. This is my code (the index already exists in elastic): val exceptions = spark .readStream .text(path) val advancedQuery…
0
votes
2 answers

distinct count on hive does not match cardinality count on elasticsearch

I have loaded data into my elasticsearch cluster from hive using the elasticsearch-hadoop plugin from elastic. I need to fetch a count of unique account numbers. I have the following queries written in both hql and queryDSL, BUT they are returning…
summerNight
  • 1,446
  • 3
  • 25
  • 52
0
votes
1 answer

How to ignore exceptions when bulk update with pyspark if doc doesn't exist

I am trying to do an update operation with elasticsearch hadoop package in pyspark. It says on the documentation that if no data is found, an exception is thrown. What is the best way to ignore this exception in pyspark? Or is it possible to pass…
0
votes
0 answers

How to form a list of maps in java spark

root |-- code: string (nullable = true) |-- mnemonic: string (nullable = true) |-- key: long (nullable = true) |-- country: string (nullable = true) |-- crossID: struct (nullable = false) | |-- codeKey: long (nullable = true) | |--…
0
votes
2 answers

java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror

java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror; at…
Echo
  • 35
  • 1
  • 7