0

We have a ELK (ElasticSearch-Logstash-Kibana) deployment in which we ship logs via logstash to Elasticsearch Cluster. Indices are created daily. We close indices which are more than 3 days old and take a snapshot of indices which are more than 7 days old and push them to Amazon S3 via curator.

We have about 10 different daily indices where average size of each index about 1GB. Replication factor of 1. Each index is having 2 shards. Logstash pushes data to ES Cluster at the rate 2000 log_events per second

Our topology

  • 3 Dedicated Master + Data
  • 1 Dedicated Client Node + Kibana

Hardware Configs

  • 12 Core
  • 64 GB RAM
  • 2 TB Spinning Disk
  • Debian 7
  • ElasticSearch Version - 1.7.1
  • Logstash - 1.5.3

All standard config has been followed like unicast mode in discovery , 30 GB RAM has been allotted.

Right now the snapshot job is run via the curator from the Client machine with the requests sent locally to the ES instance running on the Client machine. Logstash sends logs directly to the Client Node

The curator command that is being used:-

curator --timeout 21600 --host es-client --port 9200  snapshot --name $snapshot_name_$project-$date --repository walle_elk_archive indices --older-than 3 --time-unit days --timestring %Y-%m-%d --prefix $prefix

Can someone help me in the following:-

  1. Is it ok to run the curator job on the client machine as we have done?
  2. Is it ok to take the snapshot of all the indices from a single machine?
  3. Since the logs are pushed continuously will it make the cluster unstable when snapshot creation and pushing to Amazon S3 is going on?
  4. What are the best practices people generally follow for backing up old indices from Elasticsearch?
tuk
  • 333
  • 5
  • 18

1 Answers1

0

Is it ok to run the curator job on the client machine as we have done?

Yes, since the "client" machine isn't doing anything except firing REST requests at your ES cluster and waiting for the response.

Is it ok to take the snapshot of all the indices from a single machine?

Again, yes. For the same reason as the first question.

Since the logs are pushed continuously will it make the cluster unstable when snapshot creation and pushing to Amazon S3 is going on?

According to the ES docs on Snapshot And Restore

Snapshotting process is executed in non-blocking fashion. All indexing and searching 
operation can continue to be executed against the index that is being snapshotted.

There might be a slight slowdown in the indexing rate, but based on your machine specs, I would think it's probably going to be fine, but there's really no way to know unless you try it. The limiting factor for speed of the snapshot is likely to be disks for a shared file system repository, and your Internet connection speed for S3 repositories.

In terms of using an S3 repository and how that affects the process, there's not too many details (as in, none) in the docs for the S3 Repository plugin as to how it actually works. I suspect that each data node holding a primary shard would push it's shards to the repository, S3 or otherwise. This means that there's likely not going to be any more load on the ES cluster when performing snapshots to an S3 Repository than there would be to a shared file system repository.
Again, TEST IT, since each environment is unique and what works for one person might not for the next.

What are the best practices people generally follow for backing up old indices from Elasticearch?

I find that ES has quite good documentation, and there's a section on Snapshot And Restore. There's not actually much in there in terms of "best" practices, so unless you come across some other sources online, I'd say your best bet to just start trying things out to see what works for you.

GregL
  • 9,370
  • 2
  • 25
  • 36