I use snapshot method to backup my elasticsearch nodes, it works as follow:
PUT /_snapshot/my_backup/snapshot_1?wait_for_completion=true
but after new data added to elasticsearch, it's not contained in snapshot, so we need to run it periodically, but there will be a data loss if something goes wrong between 2 snapshots, is there anyway to handle it?
is there any continuously backup method for elasticsearch?
Asked
Active
Viewed 656 times
0

Mairon
- 621
- 8
- 21
-
What do you mean by "there will be a data loss if something goes wrong between 2 snapshots"? – Val Aug 09 '16 at 03:38
-
I mean that data which is added after the last snapshot won't be restored if you restore that snapshot. – Mairon Aug 09 '16 at 03:58
-
Could you rebuild the missing data from another source of truth? That's usually what people do. – Val Aug 09 '16 at 04:06
-
that's a good solution but it means there isn't anyway to do what i asked? – Mairon Aug 09 '16 at 05:43
1 Answers
2
If you want to have a "backup" of some sort that is in-sync with the data in the cluster, consider building two clusters. Whatever indexing, updating, deleting operations the "main" cluster has, you need to mirror those operations on the "backup" cluster as well. There is no other way.

Andrei Stefan
- 51,654
- 6
- 98
- 89
-
Also worth noting that the "in-sync" part is a hard one to achieve, especially between two clusters (+ potentially another primary source of truth) since so much can happen (and it's also costly). Since there's no two phase commit, it's very easy to get out of synch. From experience, it's much easier to have a reliable rebuild process handy that you can quickly leverage when data goes missing. – Val Aug 09 '16 at 06:03
-
Continuous backup means either one of the replicas of the indices (but this means the same hardware, the same cluster), or realtime updates to the backup destination. Usually people take regular snapshots and keep the original source of the data for a shorter period of time (for re-indexing purposes) or index to a mirror cluster the same data. And this is not that uncommon. True, it is costly to setup up (duplicate the hardware) and to configure (probably a proxy of some sort or load balancer) but for realtime, "continuous backup" there's no other way. – Andrei Stefan Aug 09 '16 at 06:24
-
Definitely agree. I was just questioning **the real need** of having such a continuous backup, i.e. the cost/benefit ratio is probably much higher (big cost for low benefit) than having to rebuild some of the data in case something bad happens. But again that depends on the use cases and business constraints. – Val Aug 09 '16 at 06:28