8

I have 6 instances of type m3.large.elasticsearch and storage type instance.

enter image description here

I don't really get what does Average, Minimum, Maximum ..mean here?

I am not getting any logs into my cluster right now although it shows FreeStorageSpace as 14.95GB here:

enter image description here

But my FreeStorageSpace graph for "Minimum" has reached zero!

enter image description here

What is happening here?

tedder42
  • 23,519
  • 13
  • 86
  • 102
Karup
  • 2,024
  • 3
  • 22
  • 48
  • Can you post some elasticsearch logs or any errors that you are getting? What type of setting do you use for sending data to elasticsearch? Which index pattern on Kibana are you using and which index does it correspond to in Elasticsearch. Please post more details. – Mrunal Pagnis Jul 15 '16 at 07:31

2 Answers2

8

I was also confused by this. Minimum means size on single data node - one which has least free space. And Sum means size of entire cluster (summation of free space on all data nodes). Got this info from following link

http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains.html

jigar
  • 96
  • 1
  • 4
2

We ran into the same confusion. Avg, Min, Max spreads the calculation across all nodes and Sum combines the Free/Used space for the whole cluster.

We had assumed that Average FreeStorageSpace means average free storage space of the whole cluster and set an alarm keeping the following calculation in mind:

  1. Per day index = 1 TB
  2. Max days to keep indices = 10

Hence we had an average utilization of 10 TB at any point of time. Assuming, we will go 2x - i.e. 20 TB our actual storage need as per https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/sizing-domains.html#aes-bp-storage was with replication factor of 2 is:

(20 * 2 * 1.1 / 0.95 / 0.8) = 57.89 =~ 60 TB

So we provisioned 18 X 3.8 TB instances =~ 68 TB to accomodated 2x = 60 TB

So we had set an alarm that if we go below 8 TB free storage - it means we have hit our 2x limit and should scale up. Hence we set the alarm

FreeStorageSpace <= 8388608.00 for 4 datapoints within 5 minutes + Statistic=Average + Duration=1minute

FreeStorageSpace is in MB hence - 8 TB = 8388608 MB.

But we immediately got alerted because our average utilization per node was below 8 TB.

After realizing that to get accurate storage you need to do FreeStorageSpace sum for 1 min - we set the alarm as

FreeStorageSpace <= 8388608.00 for 4 datapoints within 5 minutes + Statistic=Sum + Duration=1minute

The above calculation checked out and we were able to set the right alarms.

The same applies for ClusterUsedSpace calculation.

You should also track the actual free space percent using Cloudwatch Math:

enter image description here

Saurabh Hirani
  • 1,198
  • 14
  • 21