Questions tagged [elasticsearch-hadoop]

Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Elasticsearch real-time search and analytics natively integrated with Hadoop.

Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Requirements

Elasticsearch (0.9X series or 1.0.0 or higher (highly recommended)) cluster accessible through REST. That's it! Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded and put to use without any dependencies. Simply make it available to your job classpath and you're set. For a certain library, see the dedicated chapter.

Documentation

109 questions

votes

1 answer

Scripted_upsert with Elasticsearch-hadoop impossible?

With the Elasticsearch-hadoop Connector, is it possible to use the scripted_upsert to true on an upsert insertion ? I am using the es.update.script.inline configuration, but i can't find any way to use the script_upsert to true and to empty the…

apache-spark elasticsearch elasticsearch-hadoop

asked Mar 23 '20 at 09:31

Luc E

1,204
8
16

votes

0 answers

How to read 1M records from Elasticsearch into PySpark?

I have a problem with reading data from Elasticsearch into Spark cluster (I'm using Zeppelin environment, so all connection settings are configured in the Zeppelin interpreter settings). First, I have tried to read it with PySpark: %pyspark from…

scala apache-spark elasticsearch pyspark elasticsearch-hadoop

asked Jan 29 '20 at 14:02

Andrey Sapegin

votes

1 answer

EsHadoopIllegalArgumentException: Trouble connecting Hadoop to Elasticsearch

I'm using Databrics to run my Spark application, and I'm trying to use elasticsearch-hadoop to build a connection with Elasticsearch. After configuring a peering connection between my Databricks VPC and my Elasticsearch VPC, I can finally get the…

apache-spark elasticsearch databricks elasticsearch-hadoop

asked Jan 27 '20 at 02:52

Adam

votes

1 answer

Do Databricks workers and Elasticsearch nodes need to be in the same VPC in AWS?

I would like to write a dataframe into Elasticsearch from within Databricks. My Elasticsearch cluster is hosted on AWS and Databricks is spinning up EC2 instances with a certain role. That role has the permission to interact with my Elasticsearch…

amazon-web-services elasticsearch databricks elasticsearch-hadoop

asked Dec 15 '19 at 04:31

Adam

votes

1 answer

How To Be Sure All Documents Written To Elasticsearch Integration Using Elasticsearch-Hadoop Connector In Spark Streaming

I am writing DStream to Elasticsearch using Elasticsearch-Hadoop connector. It's the link you can find the connector https://www.elastic.co/guide/en/elasticsearch/hadoop/5.6/spark.html I need to process the window, write all the documents to ES…

apache-spark elasticsearch spark-streaming spark-streaming-kafka elasticsearch-hadoop

asked Sep 27 '19 at 13:46

Yılmaz

votes

1 answer

create new SparkSession for different query?

I'd like to get two data from elasticsearch One is filtered with a query, another has no filter. // with query session = get_spark_session(query=query) df = session.read.option( "es.resource", "analytics-prod-2019.08.02" …

python apache-spark elasticsearch pyspark elasticsearch-hadoop

asked Aug 21 '19 at 02:43

eugene

39,839
68
255
489

votes

1 answer

Elasticsearch Spark, how to query multiple times?

I'm on jupyter notebook. I'd like to use query dsl to prepare initial Dataframe. I use conf.set("es.query", dsl_query) for that. (https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#_querying) But then, I want to…

apache-spark elasticsearch pyspark elasticsearch-hadoop

asked Feb 13 '19 at 12:12

eugene

39,839
68
255
489

votes

1 answer

How do I run a query against Elasticsearch using PySpark without querying every node?

My end goal is to use PySpark to efficiently index a large volume of data in Elasticsearch (ES), then run a huge number of queries against the index and record statistics over the results. Elasticsearch version 5.6.5 Spark version 2.4.0 Hadoop…

python apache-spark elasticsearch pyspark elasticsearch-hadoop

asked Feb 13 '19 at 00:23

LaserJesus

8,230
7
47
65

votes

1 answer

Spark Scala - How to construct Scala Map from nested JSON?

I've a nested json data with nested fields that I want to extract and construct a Scala Map. Heres the sample JSON: "nested_field": [ { "airport": "sfo", "score": 1.0 }, { "airport": "phx", "score": 1.0 }, { "airport":…

scala apache-spark elasticsearch-hadoop

asked Jan 03 '19 at 02:34

user2727704

votes

0 answers

Elasticsearch bails out ES-HADOOP PLUGIN

we are using ES-HADOOP plugin to push data into Elasticsearch cluster from Hadoop HBASE table. below are the cluster details. elasticsearch version: 2.3.5 data nodes: 3 master nodes: 3 client node: 1 the data nodes are master nodes as…

hadoop apache-spark elasticsearch elasticsearch-hadoop

asked Feb 24 '18 at 09:34

chitender kumar

votes

1 answer

How to setup Elasticsearch Structured Streaming with X-Pack enabled?

I'm trying to use Elasticsearch (ES) 6.1.1 Hadoop with installed x-pack to write data using Spark Structured Streaming 2.2.1. This is my code (the index already exists in elastic): val exceptions = spark .readStream .text(path) val advancedQuery…

apache-spark spark-structured-streaming elasticsearch-hadoop

asked Jan 17 '18 at 13:14

Matthias Mueller

votes

2 answers

distinct count on hive does not match cardinality count on elasticsearch

I have loaded data into my elasticsearch cluster from hive using the elasticsearch-hadoop plugin from elastic. I need to fetch a count of unique account numbers. I have the following queries written in both hql and queryDSL, BUT they are returning…

elasticsearch hive hiveql querydsl elasticsearch-hadoop

asked Sep 15 '17 at 20:12

summerNight

1,446
3
25
52

votes

1 answer

How to ignore exceptions when bulk update with pyspark if doc doesn't exist

I am trying to do an update operation with elasticsearch hadoop package in pyspark. It says on the documentation that if no data is found, an exception is thrown. What is the best way to ignore this exception in pyspark? Or is it possible to pass…

python apache-spark elasticsearch pyspark elasticsearch-hadoop

asked Jun 23 '17 at 22:28

amstree

votes

0 answers

How to form a list of maps in java spark

java apache-spark-sql elasticsearch-hadoop

asked May 15 '17 at 17:45

Faisal Ahamed R

votes

2 answers

java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror

java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror; at…

scala apache-spark elasticsearch-hadoop

asked Apr 20 '17 at 07:25

Echo

Prev 1 2 3 4 5

7 8 Next