0

in our university we have an elasticsearch cluster with 1 Node. Now we have money to install more powerful server. We produce 7-10 millions accesslogs / day.

What is better to create a cluster with:

a. 3 powerful server each 64GB and 16 CPU + SSD.
b. to have 14 not so powerful server each 32GB and 8CPU +SSD
ps: a & b have the same price.

c. may be some recommendation?

Thank you in advance

petrolis
  • 133
  • 1
  • 3
  • 12
  • From what i know option A is better(64gb its the biggest that recommended and i had some recommendation not to pass 32 gb). But i don't sure it's very unambiguous, it's very depends on your queries – tomas Dec 01 '18 at 00:49
  • Option A is a better cluster config – ben5556 Dec 01 '18 at 02:54
  • You could probably poc it on AWS or azure, would cost you like 50 bucks, and then make a decision based on that – sramalingam24 Dec 01 '18 at 20:15

1 Answers1

3

it depends on the scenario. for the logging case you describing option b seems more flexible to me. let me explain my opinion:

  1. as you are in a logging scenario, then implement the hot/warm architecture. you'll mainly write and read recent indices. in few cases you want to access older data and you probably want to shrink old and close even older indices.

  2. set up at least 3 master eligble nodes to prevent spit brain problems. configure the same nodes also as coordinating nodes (11 nodes left)

  3. install 2 ingest nodes to move the ingestion workload to dedicated nodes (9 nodes left)

  4. install 3 hot data nodes for storing the most recent indices (6 nodes left)

  5. install 6 warm data nodes for holding older, shrinked and closed indices. (0 nodes left)

the previous setup is just a example. the node numbers/roles should be changed if

  1. if you need more resiliency. then add more master nodes, increase replica count for the index nodes. this will also reduce the total capacity.

  2. the more old data you need to have searchable or being held in already closed indices, the more warm nodes you'll need. then rebalance the hot/warm node count according to you needs. if you can drop your old data early then increase the hot node count.

  3. if you have xpack licensed, consider installing ml/alerting nodes. add this roles to the master nodes or reduce the data nodes count in favor of ml/alertig.

  4. do you need kibana/logstash? depending on the workload, prepare one/two nodes exclusively.

assuming there are the same mainboards in both options you have more potential to quickly scale the 14 boxes up just by adding more ram/cpu/storage. having 3 nodes already maxed out at the specs, you'll need to set up new boxes and join them the cluster in order to scale up. but this also gives you maybe more recent hardware in you rack over the time.

please also have a look on this: https://www.elastic.co/pdf/architecture-best-practices.pdf

if you need some background on sharding configuration please see ElasticSearch - How does sharding affect indexing performance?

BTW: thomas is right with his comment about the heap size. please have a look on this if you want to know the background: https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html

ibexit
  • 3,465
  • 1
  • 11
  • 25