1

I have 4 EC2 machines in ElasticSearch Cluster. Configuration: c5d.large, Memory: 3.5G, Data Disk: 50GB NVME instance storage. ElasticSearch Version: 6.8.21

I added the 5th machine with the same configuration c5d.large, Memory: 3.5G, Data Disk: 50GB NVME instance storage. After that, Search requests are taking more time than earlier. I enabled slow logs, which shows only shards that are present on the 5th node are taking more time for search. Also, I can see high disk Read IO happening on new node when I trigger search requests. The iowait% increases by the number of search requests and goes up to 90-95%. All old nodes do not show any read spikes.

I checked elasticsearch.yml, jvm.options and even sysctl -A configurations. there is no diff between config on new nodes vs old nodes.

What could be the issue here?

Amit
  • 30,756
  • 6
  • 57
  • 88
dnsh
  • 3,516
  • 2
  • 22
  • 47
  • I assume you have shard allocation awareness based on AZ; Try re-provisioning the VM, it may be just a bad one. – Nirmal Apr 27 '22 at 03:24
  • It may sound dumb but you can try rebooting the VM if you haven't tried it before. I have seen that similar problem is solved after reboot previously on Elasticsearch. – YD9 Apr 27 '22 at 09:09
  • @YD9 Tried rebooting the VM. It did not resolve the issue. – dnsh Apr 28 '22 at 13:19
  • @Nirmal tried with a different new VM. I am seeing the same issue there. – dnsh Apr 28 '22 at 13:20
  • @dnsh - Are you able to identify any specific query which will behave different across nodes? May be pick an expensive query and run just against the new node and then old node, that will give you 100% confidence that new node is the issue – Nirmal Apr 29 '22 at 03:14

0 Answers0