How can I begin taking snapshots of an already running ElasticSearch cluster with no prior repository setup?

Question

I've recently started working on ELK Stack in my organisation, and there's a requirement that has got me wondering.

The cluster details are as follows:

Hosted on AWS EC2 instances
No repository has been registered for backups
A curator is up an running, but not yet being utilized
Using instance store

During my research, I learnt that the best way to backup is by using the Snapshot API method, but the problem is that it requires registering repositories(such as S3), and a node restart.

I've been told that a restart will cause all the data in that node to be lost. Is this true? If not, what would be the best way to go around to begin automated backups without any loss of data, if it's possible?

Thank you.

You didn't mention, if you use EBS or an instance store. In the day to day, its easy forget that a stop instance isn't a restart instance. Be careful to use a disk storage of type instance store. Check this: https://stackoverflow.com/questions/50731496/lost-aws-ec2-disk-data-after-reboot — Juan Carlos Alafita, Mar 23 '21 at 18:56
@JuanCarlosAlafita Oh! Thank you for that pointer. We are using instance store as of now. If you could throw a little more light, does an ElasticSearch 'node' restart mean an 'instance' restart? — Abhinav Thakur, Mar 25 '21 at 04:30
No, as glenacota said previously ElasticSearch 'node' restart mean only restart the ES application/process(like "sudo service elasticsearch restart") When I said 'instance restart' I meant EC2 instance restart. For what you need to do(install a plugin), no reboot of the EC2 instance is required. I put my comment, so that you take into account the type of store you use for your EC2 instance. Node-> "Any time that you start an instance/process of Elasticsearch, you are starting a node" — Juan Carlos Alafita, Mar 25 '21 at 04:52
So, talking about disk storage, the recommendation is always to use EBS volumes. — Juan Carlos Alafita, Mar 25 '21 at 05:00

score 0 · Accepted Answer · answered Mar 23 '21 at 07:59

0

You need to restart the node to install the repository-s3 plugin (instructions here: https://www.elastic.co/guide/en/elasticsearch/plugins/current/repository-s3.html). However, restarting a node doesn't mean destroying the EC2 instance with all the attached volumes. In other words, your data will outlive the node restart. Here how to stop, here how to start again.

answered Mar 23 '21 at 07:59

glenacota

2,314
1
11
18

Thank you so much for the response. Just to confirm, restarting a node will not have any effect on the data itself, but will only restart the ElasticSearch processes, right? – Abhinav Thakur Mar 23 '21 at 11:30
That's correct. You can check the `path.data` field of your `elasticsearch.yml` file to see where the data is currently persisted. Restarting a node won't delete the data directory. And once again, by restarting the node I mean restarting the Elasticsearch process on the current EC2 instance :) – glenacota Mar 23 '21 at 12:27

How can I begin taking snapshots of an already running ElasticSearch cluster with no prior repository setup?

1 Answers1