0

I have a Ceph cluster managed by Rook with a single RGW store over it. We are trying to figure out the best backup strategy for this store. We are considering the following options: using rclone to backup object via an S3 interface, using s3fs-fuse (haven’t tested it yet but s3fs-fuse is known to be not reliable enough), and using NFS-Ganesha to reexport the RGW store as an NFS share. We are going to have quite a lot of RGW users and quite a lot of buckets, so all three solutions do not scale well for us. Another possibility is to perform snapshots of RADOS pools backing the RGW store and to backup these snapshots, but the RTO will be much higher in that case. Another problem with snapshots is that it does not seem possible to perform them consistently across all RGW-backing pools. We never delete objects from the RGW store, so this problem does not seem to be that big if we start snapshotting from the metadata pool - all the data it refers to will remain in place even if we create a snapshot on the data pool a bit later. It won’t be super consistent but it should not be broken either. It’s not entirely clear how to restore single objects in a timely manner using this snapshotting scheme (to be honest, it’s not entirely clear how to restore using this scheme at all), but it seems to be worth trying. What other options do we have? Am I missing something?

Alex
  • 907
  • 2
  • 8
  • 22
  • 1
    Pool snapshots are not very helpful, rsync to a different storage backend is a valid option, or you create a second zone and replicate (asynchronously), a multi-site setup. – eblock Nov 11 '20 at 08:51

1 Answers1

1

We're planning to implement Ceph in 2021. We don't expect a large number of users and buckets, initially. While waiting for https://tracker.ceph.com/projects/ceph/wiki/Rgw_-_Snapshots, I successfully tested this solution to address the Object Store protection by taking advantage of multisite configuration + sync policy (https://docs.ceph.com/en/latest/radosgw/multisite-sync-policy/) in the "Octopus" version. Assuming you have all zones in the Prod site Zone Sync'd to the DRS,

  • create a Zone in the DRS, e.g. "backupZone", not Zone Sync'd from or to any of the other Prod or DRS zones;
  • the endpoints for this backupZone are in 2 or more DRS cluster nodes;
  • using (https://rclone.org/s3/) write a bash script: for each the "bucket"s in the DRS zones, create a version enabled "bucket"-p in the backupZone and schedule sync, e.g. twice a day, from "bucket" to "bucket"-p;
  • protect the access to the backupZone endpoints so that no ordinary
    user (or integration) can access them, only accessible from the other nodes in the
    cluster (obviously) and the server running the rclone-based script;
  • when there is a failure, just recover all the objects from the *-p buckets, once again using rclone, to the original buckets or to filesystem.

This protects from the following failures:

Infra:

  • Bucket or pool failure;
  • Object pervasive corruption;
  • Loss of a site

Human error:

  • Deletion of versions or objects;
  • Removal of buckets
  • Elimination of entire Pools

Notes:

  • Only the latest version of each object is sync'd to the protected (*-p) bucket, but if the script runs several times you have the latest states of the objects through time;
  • when an object is deleted in the prod bucket, rnode just flags the object with the DeleteMarker upon sync
  • this does not scale!! As the number of buckets increases, the time to sync becomes untenable
Goulart
  • 31
  • 3