Questions tagged [elasticsearch-hadoop]

Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Elasticsearch real-time search and analytics natively integrated with Hadoop.

Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Requirements

Elasticsearch (0.9X series or 1.0.0 or higher (highly recommended)) cluster accessible through REST. That's it! Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded and put to use without any dependencies. Simply make it available to your job classpath and you're set. For a certain library, see the dedicated chapter.

Documentation

109 questions
3
votes
0 answers

Elasticsearch-Hadoop connector for Spark Dataframe

I am trying to write a spark dataframe to Elasticsearch as follows: df.write.format("es").save("db/test") Unfortunately, I receive the following error: Py4JJavaError: An error occurred while calling o50.save.: org.apache.spark.SparkException: Job…
Stijn
  • 459
  • 2
  • 8
  • 18
3
votes
2 answers

Indexing tuples from storm to elasticsearch with elasticsearch-hadoop library does not work

I want to index documents into Elasticsearch from Storm, but I couldn't get any document to be indexed into Elasticsearch. In my topology I have a KafkaSpout that emits a json like this { “tweetId”: 1, “text”: “hello” } to a EsBolt that is a native…
3
votes
1 answer

What is the Best way to insert Entries into ElasticSearch?

I am new to ElasticSearch and I have a file of 180 fields and 12 million lines. I have created an index and type in ElasticSearch and Java Program but it takes 1.5 hrs. Is there any other best way to to load data into ElasticSearch with reduced…
Jerin J
  • 75
  • 1
  • 5
3
votes
1 answer

Build failure while building a project using ElasticSearch-Hadoop

I am unable to build a Java project which uses ElasticSearch-Hadoop. This is the error that I am seeing, when I try to build my project: Scanning for projects... ------------------------------------------------------------------------ Building…
3
votes
1 answer

Hivesever2 unable to load EsStorageHandler class from elasticsearch-hadoop

I have this configuration in hive-site.xml hive.aux.jars.path /path/to/elasticsearch-hadoop-2.0.1.jar When I map data to Elasticsearch in HiveCli, it work correctly by this code: CREATE…
2
votes
1 answer

Elasticsearch best practices : it is a good idea to implement Ha Proxy in front of Elasticsearch 7?

In the Elasticsearch Spark/Hadoop documentation, I can read the following option : es.nodes.wan.only (default : false) Whether the connector is used against an Elasticsearch instance in a cloud/restricted environment over the WAN, such as Amazon…
Klun
  • 78
  • 2
  • 25
2
votes
0 answers

Spark UI stuck while attempting to create Dynamic Dataframes

I am using Spark (2.2.0) with ElasticSeach Hadoop (7.6.0) The purpose of my Spark Job is process records from a parquet file, and append it by unique to documents already present in ElasticSearch. Since ElasticSearch doesn't support updates, the…
2
votes
1 answer

Data load from HDFS to ES taking very long time

I have created an external table in hive and need to move the data to ES (of 2 nodes, each with 1 TB). Below regular query taking very long time (more than 6 hours) for a source table with 9GB of data. INSERT INTO TABLE…
2
votes
1 answer

How to reindex data from one Elasticsearch cluster to another with elasticsearch-hadoop in Spark

I have two separated Elasticsearch clusters, I want to reindex the data from the first cluster to the second cluster, but I found that I can only setup one Elasticsearch cluster inside SparkContext configuration, such as: var sparkConf : SparkConf =…
Jack
  • 5,540
  • 13
  • 65
  • 113
2
votes
1 answer

Apache Spark: JOINing RDDs (data sets) using custom criteria/fuzzy matching

Is it possible to join two (Pair)RDDs (or Datasets/DataFrames) (on multiple fields) using some "custom criteria"/fuzzy matching, e.g. range/interval for numbers or dates and various "distance methods", e.g. Levenshtein, for strings? For "grouping"…
2
votes
1 answer

Spark (Java) to Elasticsearch

I am testing to load data from a csv to spark then save it in Elasticsearch but I am having some trouble on saving my RDD collection in Elasticsearch using spark. This error is raised when submitting job: Exception in thread "main"…
2
votes
2 answers

Spark-Cassandra Vs Spark-Elasticsearch

I have been using Elasticsearch for quite sometime now and little experience using Cassandra. Now, I have a project we want to use spark to process the data but I need to decide if we should use Cassandra or Elasticsearch as the datastore to load my…
2
votes
2 answers

Is JOIN operation possible in ElasticSearch using any ES Connector for presto or Hive (ElasticSearch-Hadoop)?

As we know that JOIN operation is not possible in ElasticSearch among indices, Can it be achieved using Presto or Hive, i.e. can we do a JOIN operation using any ElasticSearch Connector for Presto or Hive ? Can we do JOIN in ElasticSearch using…
sumanth232
  • 587
  • 7
  • 20
2
votes
1 answer

Writing Hadoop reduce output to Elasticsearch

I'm having a bit of trouble understanding how to write the output of a simple Hadoop back into Elasticsearch. Job is configured…
Eddy
  • 1,662
  • 2
  • 21
  • 36
1
vote
0 answers

es.read.source.filter v.s. es.read.field.include when reading data with elasticsearch-hadoop

When reading data from Elasticsearch with elasticsearch-hadoop, there are two options two specify how to reading a subset of fields from the source, according to the offical documents, i.e,. es.read.field.include: Fields/properties that are parsed…