Elasticsearch-Hadoop get Non-indexed data

Question

I have an elasticsearch cluster which has big amount of data. I want to extract all data from elasticsearch into Hadoop(Hive). I used Elasticsearch-Hadoop driver in order to extract data from elasticsearch by using Hive external table but it is too slow and fails the task always.

My first problem is to get all data from my existing elasticsearch cluster. Second problem is to duplicate all data which is streaming into elasticsearch on HDFS once in a day or an hour.

How can i achieve these?

Thanks in advance.

score 0 · Accepted Answer · answered Apr 10 '15 at 10:01

0

You can use hadoop system as warehouse to store the data from where you can push the data to elasticsearch & vice versa.Try to use elasticsearch for only data you want to do analysis on present remove rest of the data from elasticsearch. So everytime you want to do analysis on different aspect pull that data from hadoop & use it.

answered Apr 10 '15 at 10:01

Bhavesh Gadoya

196
2
13

Refer to elasticsearch mapreduce api for reading data from elasticsearch. Try to write custom MR jobs for doing same. – Bhavesh Gadoya Apr 10 '15 at 10:02

Elasticsearch-Hadoop get Non-indexed data

1 Answers1