Questions tagged [elasticsearch-hadoop]

Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Elasticsearch real-time search and analytics natively integrated with Hadoop.

Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Requirements

Elasticsearch (0.9X series or 1.0.0 or higher (highly recommended)) cluster accessible through REST. That's it! Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded and put to use without any dependencies. Simply make it available to your job classpath and you're set. For a certain library, see the dedicated chapter.

Documentation

109 questions
1
vote
0 answers

Unable to create external table in elasticsearch using es-hadoop

i am running a simple spark-submit job, e.g.: enter code here spark-submit --class com.x.y.z.logan /home/test/spark/sample.jar table in jar file hiveContext.sql("CREATE TABLE IF NOT EXISTS databasename.tablename(es_column_name STRING)…
1
vote
1 answer

Pyspark converting rdd to dataframe with nulls

I am using pyspark (1.6) and elasticsearch-hadoop (5.1.1). I am getting my data from elasticsearch into a rdd format via: es_rdd = sc.newAPIHadoopRDD( …
wrdeman
  • 810
  • 10
  • 23
1
vote
1 answer

How to fix an error when an empty string is being written to elastic search from an Apache Spark job?

There is an exception being thrown when I execute my Scala app with functionality of myRDD.saveToEs (I also tried saveToEs from a dataframe). My ES version is 2.3.5. I am using Spark 1.5.0 so maybe there is a way to configure this in the…
ZeroGraviti
  • 1,047
  • 2
  • 12
  • 28
1
vote
1 answer

Spark Web UI "take at SerDeUtil.scala:201" interpretation

I am creating a Spark RDD by loading data from Elasticsearch using the elasticsearch-hadoop connector in python (importing pyspark) as: es_cluster_read_conf = { "es.nodes" : "XXX", "es.port" : "XXX", "es.resource" :…
Manav Garg
  • 512
  • 1
  • 3
  • 17
1
vote
1 answer

How to search multiple indices using elasticsearch hadoop

Suppose the following senario: We have following indices index-1,index-2,index-4, yes for some reason 'index-3' was missed, by I didn't know that during search time, so i'd like to search a index pattern like "index-1,index-2,index-3,index-4", in…
Qichu Gong
  • 61
  • 1
1
vote
1 answer

How to set es.nodes parameter to multiple Elasticsearch nodes for Spark ?

So I want to read data from multiple Elasticsearch nodes into Spark. I prefer to use the "es.nodes" parameter and set "es.nodes.discovery" to false. The configuration parameters are described here. I tried to find some example on how to set…
ZianyD
  • 171
  • 2
  • 12
1
vote
1 answer

Mapping field names of the output from Spark-Streaming to Elastic Search

I am using the following code to store the output of Spark-Streaming to ElasticSearch. I want to map the output of spark-streaming to the proper name i.e (Key, OsName, PlatFormName, Mobile, BrowserName, Count). But as you can see currently it is…
Naresh
  • 5,073
  • 12
  • 67
  • 124
1
vote
0 answers

Truncate elastic search hive tables

I am using Elasticsearch Hive integration, so that I can query from Hadoop tables, sending alerts when data is bad (with ElastAlert), as well as display on Kibana. This is how I created the Elastic table: CREATE EXTERNAL TABLE my_elastic_table ( …
yuan0122
  • 441
  • 9
  • 18
1
vote
1 answer

Issue when writing to elasticsearch using es-hadoop

Am getting this exception when I'm trying to write to Elasticsearch using mapreduce program with es-hadoop. Am trying to write to index=employee and type=basic which already exists in my Elasticsearch cluster. My stack trace :- Exception in thread…
Sachin
  • 1,675
  • 2
  • 19
  • 42
1
vote
1 answer

Is there a way to apply multiple groupings in storm?

I want to apply "Fields grouping" as well as "Local or shuffle grouping" to my topology such that each spout sends data to local bolts only but also uses a field in my document to decide what local-bolts it should go to. So if there were two worker…
user2250246
  • 3,807
  • 5
  • 43
  • 71
1
vote
0 answers

Bind Elastic-Search to localhost as well as an IP address

modules-network in Elastic-Search documentation says that it can bind to more than one network addresses by specifying an array of IP addresses in network.bind_host I put the following in my config/elasticsearch.yaml: # Used a real IP address in the…
user2250246
  • 3,807
  • 5
  • 43
  • 71
1
vote
1 answer

FAILED: SemanticException Cannot find class 'org.elasticsearch.hadoop.hive.ESStorageHandler'

I am following https://gist.github.com/costin/8025827 example not sure why am getting this error. Any response is highly appreciated. hive> ADD JAR hdfs:///auxlib/elasticsearch-hadoop-2.2.0.jar ; converting to…
1
vote
1 answer

Writing json from HDFS to Elasticsearch using elasticsearch-hadoop map-reduce

We have some json data stored into HDFS and we are trying to use elasticsearch-hadoop map reduce to ingest data into Elasticsearch. The code we used is very simple (below) public class TestOneFileJob extends Configured implements Tool { public…
Fanooos
  • 2,718
  • 5
  • 31
  • 55
1
vote
1 answer

how to index json to elasticsearch using hadoop map-reduce and es-hadoop?

I have huge set of data stored in HDFS which we want to index into Elasticsearch. The trivial thinking is to use Elasticsearch-hadoop library. I followed the concept in this video and here is the code I wrote for this job. public class…
Fanooos
  • 2,718
  • 5
  • 31
  • 55
1
vote
1 answer

Spark machine learning and Elasticsearch analyzed tokens/text in Python

I'm trying to build an application that indexes a bunch of documents in Elasticsearch and retrieves the documents through Boolean queries into Spark for machine learning. I'm trying to do this all through Python through pySpark and…
plam
  • 1,305
  • 3
  • 15
  • 24