0

Is it possible to set up Elasticsearch in a way that you can take a small index (around 100MB or so) and configure a cluster so that all data in that index is available, in whole, on every node in the cluster?

Basically we're looking to leverage Elasticsearch in a way similar to an in memory cache on all of our application servers so that when an application queries against the index, the query is served locally on the same node and never leaves the machine so we avoid incurring network overhead (the network here is somewhat slow and prone to problems).

This index is high read, low write. There are maybe 50 writes a day, if that, as the data is not volatile at all (think mostly configuration type stuff, etc).

After reading quite a bit on setting up clusters I still can't suss out whether or not you can force the entirety of an index to be fully available on every node or if that is simply not possible as Elasticsearch will always try to distribute the data over separate nodes.

If this is possible - what would be the desired configuration for a 40 node cluster? One shard per node plus one replica?

Any information or ideas on this would be much appreciated.

Regards,

Craig

Craig Koster
  • 496
  • 5
  • 15
  • You would need to have the number of `replicas=number of nodes - 1`. Then you would have to query using `_only_node`(see https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-preference.html). If I may, why would you spend so much energy on a ES cluster to do that. I'm not sure though if ES will distribute shards evenly, so you might have to force awareness zones so shards are well distributed across your cluster (https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-awareness.html) – Adonis Aug 08 '17 at 12:45
  • Thank you for those recommendations. To answer your question, we're spending the time/energy on this as the client I am working for has extremely poor infrastructure and support and we have expectations to build stable, high performance apps on top of this infrastructure - very difficult to do. To mitigate the problems we've seen with network and database performance and reliability we are going this route. It's not all mitigation, though. Having our data in Elastic makes it easy for developers to write custom queries against operational data that would be difficult to support via SQL. – Craig Koster Aug 08 '17 at 13:09

0 Answers0