Elasticsearch bails out ES-HADOOP PLUGIN

Question

we are using ES-HADOOP plugin to push data into Elasticsearch cluster from Hadoop HBASE table. below are the cluster details.

elasticsearch version: 2.3.5
data nodes: 3
master nodes: 3
client node: 1

the data nodes are master nodes as well.

data/master nodes heap: 20GB
client nodes heap: 3GB
Number of Primary Shards per index: 5
Number of Replica Shards per index: 1

when we execute jobs on Spark and on stages where we push data from Hadoop to Elasticsearch after some time we start getting ElasticSearch Bailing Out.

we suspect that number of concurrent connections which Elasticsearch can process for Bulk API is exceeding by the Spark Executors due to which post maximum numbers of connections Elasticsearch start rejecting the write requests.

How we can identify that how much concurrent bulk API connection can ElasticSearch Client node can process and successfully write the data and what should be the maximum number of documents per BULK API REQUEST?

What parameters we should look into optimise the ElasticSearch cluster for write operations where we need to index 80-90 GB data in a hour?

There is no limit but max number of bulk request and size of each bulk request will have to be adjusted based on document size. Also be watchful of the default request timeout of 60 seconds per request — sramalingam24, Feb 25 '18 at 04:36
@sramalingam24 thanks for response. also can you help me with on what parameter I should configure the max number of bulk request and document size. — chitender kumar, Feb 25 '18 at 06:03
Also take a look at this https://www.elastic.co/guide/en/elasticsearch/hadoop/6.2/errorhandlers.html#errorhandlers-bulk — sramalingam24, Feb 25 '18 at 23:27
And these https://www.elastic.co/guide/en/elasticsearch/hadoop/6.2/configuration.html https://www.elastic.co/guide/en/elasticsearch/hadoop/6.2/performance.html — sramalingam24, Feb 25 '18 at 23:47

Elasticsearch bails out ES-HADOOP PLUGIN

0 Answers0