I have an elasticsearch cluster which has big amount of data. I want to extract all data from elasticsearch into Hadoop(Hive). I used Elasticsearch-Hadoop driver in order to extract data from elasticsearch by using Hive external table but it is too slow and fails the task always.
My first problem is to get all data from my existing elasticsearch cluster. Second problem is to duplicate all data which is streaming into elasticsearch on HDFS once in a day or an hour.
How can i achieve these?
Thanks in advance.