0

We have an Elasticsearch cluster with 9 nodes with the following settings:

  • Elasticsearch Version 5.1.2
  • One Index in Cluster
  • Primary Shard Storage Size: 3GB
  • Number of Shards: 5
  • Number of Replica: 3
  • Node-1, Node-2 and Node-3 Master Only Nodes
  • Node-4 through Node-9 Data Only Nodes
  • No Parent Child Relationship in Mappings
  • Each node 24 GB of Ram, 18 Cores of CPU
  • Disabled Swaped, Increased Open File Descriptor, 12 GB JVM Heap Memory
  • Nest Client 'Static' Adaptor And List of all Nodes IPs

As you see we have an over allocation of resources on our nodes but under stress test only one node uses all it's available search threads. As I mentioned we have 18 cores and according to default search thread limit we have (3*18/2)+1 = 28 search threads in each node.
Problems:

  • Http Requests Are Not Balanced
  • Other nodes don't use all their search threads. One nodes uses It's all threads and It's search queue gets large

What we have tested:
- Use one coordinator node to balance requests (no change)
How we send requests:
- We use Elasticsearch as a Search Engine and a Jmeter is used to put stress test on search services. Test services are web services which call Some SearchTemplates using Elasticsearch Nest Client

Open HTTP Requests and Search Thread Pool

Cpu Usages

Queru Count and Fetch Count

Any idea is appreciated.

Mohammad Mazraeh
  • 1,044
  • 7
  • 12

1 Answers1

0

Have a read of https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/connection-pooling.html

Looks like you're using the SingleNodeConnectionPool which is used when you use the low ceremony ElasticClient, i.e. var client = new ElasticClient(uri); In this case, all your requests will be sent to one node which needs to act as a Coordinator node described here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

A search request, for example, is executed in two phases which are coordinated by the node which receives the client request — the coordinating node.

In the scatter phase, the coordinating node forwards the request to the data nodes which hold the data. Each data node executes the request locally and returns its results to the coordinating node. In the gather phase, the coordinating node reduces each data node’s results into a single global resultset.

Every node is implicitly a coordinating node. This means that a node that has all three node.master, node.data and node.ingest set to false will only act as a coordinating node, which cannot be disabled. As a result, such a node needs to have enough memory and CPU in order to deal with the gather phase.

StaticConnectionPool or SniffingConnectionPool would be a better choice for your cluster.

Reza
  • 11
  • 2