7

I'm slightly confused about how my elasticsearch cluster will handle traffic. I have several EC2 instances connected in a cluster. Now, in my application I've set it to connect to the cluster via the ip of one of the instances. I know this node can then connect with all the others in the cluster and forward appropriately, but will that particular instance not become overburdened as all of the traffic is being directed initially at that one node? Do I have to use a load balancer and then point the application to that, or am I not understanding this properly?

Thanks! :)

dan martin
  • 1,307
  • 3
  • 15
  • 29

2 Answers2

2

I think your question and thinking is legit. In my experience however clients should be aware of multiple instances and distribute the load without the need for a load balancer.

See this client config example for ruby: Multiple nodes and retry on falure

Andreyy
  • 511
  • 2
  • 11
  • That makes sense - what do you think about if, in the client, I make an array of elasticsearch nodes, and then use a random number generator to pick the node randomly? For example: [node1, node2, node3] random number generator picks 2, then the client hits node2 for this particular search? – dan martin Nov 06 '16 at 19:41
  • That sound OK, but I would expect a client library to do that for you. Additionally a good client library would also retry a query that failed on a different node and even more, remove the node for some time from the array if it's failing all the time. – Andreyy Nov 08 '16 at 13:39
  • Upvoted but you should write "client HAVE to be aware". If not properly coded, clients will overload one server and unbalance the cluster wich result in a global waste of ressources. – bokan Nov 21 '18 at 16:53
0

No, it's not necessary. Elasticsearch already handles load balancing for you by sharding and replicating search index data amongst different nodes in the cluster.

Reference: https://www.elastic.co/guide/en/elasticsearch/reference/current/_basic_concepts.html#_shards_amp_replicas

Possible duplicate: Is using a load balancer with ElasticSearch unnecessary?

Community
  • 1
  • 1
ck1
  • 5,243
  • 1
  • 21
  • 25
  • 1
    Thanks for the answer, but I already know about this behaviour. What I'm concerned about is the fact that I'm using the ip address of a single node in the cluster for all of my requests - so while that node will, of course, direct the requests appropriately, it's still that one node that is being directly hit with all of my requests and therefore that one node that has to handle all of the traffic throughput no? – dan martin Nov 05 '16 at 13:45
  • If you're using the `TransportClient`, you can configure it with multiple IP addresses for the nodes in your cluster, and it will round robin requests among them. It's also possible to use `client.transport.sniff=true`, in which case data nodes will be automatically discovered, and requests will be load balanced among them. – ck1 Nov 06 '16 at 20:11