0

we are using ES-HADOOP plugin to push data into Elasticsearch cluster from Hadoop HBASE table. below are the cluster details.

  • elasticsearch version: 2.3.5
  • data nodes: 3
  • master nodes: 3
  • client node: 1

the data nodes are master nodes as well.

  • data/master nodes heap: 20GB
  • client nodes heap: 3GB
  • Number of Primary Shards per index: 5
  • Number of Replica Shards per index: 1

when we execute jobs on Spark and on stages where we push data from Hadoop to Elasticsearch after some time we start getting ElasticSearch Bailing Out.

we suspect that number of concurrent connections which Elasticsearch can process for Bulk API is exceeding by the Spark Executors due to which post maximum numbers of connections Elasticsearch start rejecting the write requests.

How we can identify that how much concurrent bulk API connection can ElasticSearch Client node can process and successfully write the data and what should be the maximum number of documents per BULK API REQUEST?

What parameters we should look into optimise the ElasticSearch cluster for write operations where we need to index 80-90 GB data in a hour?

James Z
  • 12,209
  • 10
  • 24
  • 44
chitender kumar
  • 394
  • 4
  • 21

0 Answers0