Questions tagged [elasticsearch-hadoop]

Elasticsearch real-time search and analytics natively integrated with Hadoop. Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Elasticsearch real-time search and analytics natively integrated with Hadoop.

Supports Map/Reduce, Cascading, Apache Hive, Apache Pig, Apache Spark and Apache Storm.

Requirements

Elasticsearch (0.9X series or 1.0.0 or higher (highly recommended)) cluster accessible through REST. That's it! Significant effort has been invested to create a small, dependency-free, self-contained jar that can be downloaded and put to use without any dependencies. Simply make it available to your job classpath and you're set. For a certain library, see the dedicated chapter.

Documentation

109 questions

votes

1 answer

Retrieve metrics from elasticsearch-spark

At the end of an ETL Cascading job, I am extracting metrics about the Elasticsearch ingestion using Hadoop metrics that elasticsearch-hadoop exposes using Hadoop counters. I want to do the same using Spark, but I don't find documentation related to…

asked Apr 03 '17 at 14:00

angelcervera

3,699
1
40
68

votes

1 answer

Is it possible to write to a dynamically created Elasticsearch index with a formatted date using elasticsearch-hadoop/spark?

Within standalone spark I'm trying to write from a dataframe to Elasticsearch. While I can get that to work, what I can't figure out is how to write to a dynamically named index that is formatted like 'index_name-{ts_col:{YYYY-mm-dd}}', where…

python apache-spark elasticsearch-hadoop

asked Feb 24 '17 at 18:32

Jim

votes

1 answer

Inserting arrays in Elasticsearch via PySpark

I have a case much like this one: Example DataFrame: from pyspark.sql.types import * schema = StructType([ # schema StructField("id", StringType(), True), StructField("email", ArrayType(StringType()), True)]) df =…

apache-spark elasticsearch pyspark elasticsearch-hadoop

asked Feb 08 '17 at 13:31

dtj

votes

1 answer

Ingesting data in elasticsearch from hdfs , cluster setup and usage

I am setting up a spark cluster. I have hdfs data nodes and spark master nodes on same instances. Current setup is 1-master (spark and hdfs) 6-spark workers and hdfs data nodes All instances are same, 16gig dual core (unfortunately). I have 3…

hadoop elasticsearch apache-spark cluster-computing elasticsearch-hadoop

asked Dec 22 '16 at 17:33

rohit

votes

1 answer

Insert geograpic data in Elastic Search from Spark

I try to upload an RDD with a latitude and a longitude fields in my ES. I would like to use the geo_point type to plot them on a map. I tried to create a "location" field for each document containing either a string like "12.25, -5.2" or a array of…

scala elasticsearch apache-spark elasticsearch-hadoop

asked Nov 29 '16 at 08:49

Benjamin

3,350
4
24
49

votes

1 answer

Elasticsearch hadoop configure bulk batch size

I read through possibly Stackoverflow that es-hadoop / es-spark projects use bulk indexing. If it does is the default batchsize is as per BulkProcessor(5Mb). Is there any configuration to change this. I am using…

elasticsearch elasticsearch-hadoop elasticsearch-spark

asked Nov 09 '16 at 03:18

rohit

votes

1 answer

Elasticsearch 5.0 and Elasticsearch-Spark connector - what is correct maven artefact

When writing application to run on Apache Spark 1.6 using Elasticsearch-Spark connector, documentation at (https://www.elastic.co/guide/en/elasticsearch/hadoop/5.0/install.html#_minimalistic_binaries) says to use maven artefact …

java elasticsearch-hadoop elasticsearch-spark

asked Nov 04 '16 at 18:27

Vladimir Kroz

5,237
6
39
50

votes

1 answer

Upgrading to Spark 2.0 dataframe.map

I'm updating some Spark 1.6 code to 2.0.1 and I'm running into some issues using map. I see other questions on SO questions like encoder-error-while-trying-to-map-dataframe-row-to-updated-row but I have not been able to get these techniques to…

apache-spark elasticsearch-hadoop

asked Nov 04 '16 at 14:39

jspooner

10,975
11
58
81

votes

1 answer

How to parallel reIndex ElasticSearch

I'm trying to reIndex ElasticSearch, I used Scan and Bulk API, but it's very slow, how can I parallel the process to make it faster. My python code as following: actions=[] for hit in helpers.scan(es,scroll='20m',index=INDEX,doc_type=TYPE,params= …

elasticsearch elasticsearch-hadoop

asked Aug 25 '16 at 14:19

Jack

5,540
13
65
113

votes

1 answer

how to get term vectors by using Elasticsearch Hadoop

I'm using ElasticSearch-Hadoop API. And I was trying to get _mtermvector by using the following Spark code: val query= """_mtermvectors { "ids" : ["1256"], "parameters": { "fields": [ "tname" …

scala elasticsearch apache-spark elasticsearch-hadoop

asked Jun 29 '16 at 15:16

Jack

5,540
13
65
113

votes

1 answer

how does elasticsearch-hadoop create two RDDs based on different ES clusters

I need to join two Rdds from two different ES clusters,but I found I just can create one SparkConf and SparkContext based on one ES cluster. For example the code as following: var sparkConf: SparkConf = new SparkConf() sparkConf.set("es.nodes",…

elasticsearch apache-spark elasticsearch-hadoop

asked May 24 '16 at 19:03

Jack

5,540
13
65
113

votes

0 answers

Elasticsearch count is less than indexed while using elasticsearch-hadoop-2.2

I created an index and indexed data into it using elasticsearch-hadoop-2.2. The HQL looks like this: CREATE EXTERNAL TABLE es_external_table ( field1 type1, field2 type2 ) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES…

elasticsearch elasticsearch-hadoop

asked May 24 '16 at 06:54

Longxing Wei

votes

1 answer

ResouceManager got stucked in Accepted State

I am trying to integrate my es 2.2.0 version with hadoop HDFS.In my envoirnment,I have 1 master node and 1 data node. On my master node my Es is installed. But while integrating it with HDFS my resource manager applications jobs get stuck in…

hadoop elasticsearch elasticsearch-hadoop

asked May 17 '16 at 07:28

krishna kumar

1,190
12
14

votes

2 answers

Extracting data from documents stored in HDFS to index in Elasticsearch

I have a HDFS archive to store variety of documents like pdf,ms word file,ppt,csv etc. I would like to build a platform using elasticsearch to search the file or text contents. I know I can use the es-hadoop plugin to index data to from HDFS to ES.…

hadoop elasticsearch full-text-search elasticsearch-hadoop

asked Apr 05 '16 at 07:18

Sachin

1,675
2
19
42

votes

1 answer

mvn package elasticsearch-spark error

I had a maven project that want to use es-spark to read from elasticsearch, my pom.xml is like: com.jzdata.logv es-spark 0.0.1-SNAPSHOT jar …

maven elasticsearch apache-spark elasticsearch-hadoop

asked Jan 28 '16 at 07:55

fmyblack

Prev 1 2 3 4 5 6

8 Next