Unbalanced Elasticsearch Performance

Question

We have an Elasticsearch cluster with 9 nodes with the following settings:

Elasticsearch Version 5.1.2
One Index in Cluster
Primary Shard Storage Size: 3GB
Number of Shards: 5
Number of Replica: 3
Node-1, Node-2 and Node-3 Master Only Nodes
Node-4 through Node-9 Data Only Nodes
No Parent Child Relationship in Mappings
Each node 24 GB of Ram, 18 Cores of CPU
Disabled Swaped, Increased Open File Descriptor, 12 GB JVM Heap Memory
Nest Client 'Static' Adaptor And List of all Nodes IPs

As you see we have an over allocation of resources on our nodes but under stress test only one node uses all it's available search threads. As I mentioned we have 18 cores and according to default search thread limit we have (3*18/2)+1 = 28 search threads in each node.
Problems:

Http Requests Are Not Balanced
Other nodes don't use all their search threads. One nodes uses It's all threads and It's search queue gets large

What we have tested:
- Use one coordinator node to balance requests (no change)
How we send requests:
- We use Elasticsearch as a Search Engine and a Jmeter is used to put stress test on search services. Test services are web services which call Some SearchTemplates using Elasticsearch Nest Client

Any idea is appreciated.

score 0 · Answer 1 · answered Jul 09 '17 at 09:12

Have a read of https://www.elastic.co/guide/en/elasticsearch/client/net-api/current/connection-pooling.html

Looks like you're using the SingleNodeConnectionPool which is used when you use the low ceremony ElasticClient, i.e. var client = new ElasticClient(uri); In this case, all your requests will be sent to one node which needs to act as a Coordinator node described here:

https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-node.html

A search request, for example, is executed in two phases which are coordinated by the node which receives the client request — the coordinating node.

In the scatter phase, the coordinating node forwards the request to the data nodes which hold the data. Each data node executes the request locally and returns its results to the coordinating node. In the gather phase, the coordinating node reduces each data node’s results into a single global resultset.

Every node is implicitly a coordinating node. This means that a node that has all three node.master, node.data and node.ingest set to false will only act as a coordinating node, which cannot be disabled. As a result, such a node needs to have enough memory and CPU in order to deal with the gather phase.

StaticConnectionPool or SniffingConnectionPool would be a better choice for your cluster.

Thanks Reza. But as I mentioned in my question we use `Nest Client 'Static' Adaptor And List of all Nodes IPs` — Mohammad Mazraeh, Jul 09 '17 at 09:19

Unbalanced Elasticsearch Performance

1 Answers1