3

We are using two servers separated by a WAN to replicate approximately 1TB of data.

On the master side we have a single server with a Gluster volume exported to a number of other servers that write in data.

On the slave side we have a single server with a Gluster volume exported as a read only share to disaster recovery servers.

Over time the slave has become out of sync with the master to the tune of 200gb, files that should be there aren't and files that have been deleted are. There does not appear to be a great deal of consistency in this.

What is the simplest way to force cluster to checksum every file on the slave and re-replicate where required?

The documentation suggests:

Description: GlusterFS Geo-replication did not synchronize the data completely but still the geo-replication status display OK.

Solution: You can enforce a full sync of the data by erasing the index and restarting GlusterFS Geo-replication. After restarting, GlusterFS Geo-replication begins synchronizing all the data, that is, all files will be compared with by means of being checksummed, which can be a lengthy /resource high utilization operation, mainly on large data sets (however, actual data loss will not occur). If the error situation persists, contact Gluster Support.

But does not refer to where this index may be.

#   gluster volume geo-replication share gluk1::share stop
Stopping geo-replication session between share & gluk1::share has been successful
# gluster volume set share geo-replication.indexing off
volume set: failed: geo-replication.indexing cannot be disabled while geo-replication sessions exist

This index shutoff fails while the connection still exists at all and the documentation doesn't mention this requirement.

Any suggestions?

Antitribu
  • 1,719
  • 3
  • 23
  • 37

1 Answers1

3

Your slaves became out of sync because GlusterFS Geo-Replication is not meant for multiple changing data pool (distributed FS), rather for disaster recovery (read-only backup).

In short, geo-replication is a master/slave model, where only the master site pushes writes/changes, and any changes is periodically synched to the remote read-only slave.

To have a true distributed, replicated filesystem you had to use GlusterFS's "Replicated Volume" feature. The drawback is that with current replication scheme writes are forced to be synchronous: this means that if you are replicating between a WAN link, even your local, intra-LAN writes will be as slow as the WAN path. To overcome this limit, a "New Style Replication" is considered for inclusion, but it seems to not be implemented yet (at least on stable, enterprise distribution).

Back to your current situation, you are in a classical "split-brain scenario" and I am not sure what you can do: your master and slaves have different view of the underlying volumes, and they probably accumulated different, incompatible changes to the same files. I think you had to (more or less) manually review them...

shodanshok
  • 47,711
  • 7
  • 111
  • 180
  • Hi, when this was active the slave was in read only and no local changes were made at all. Only changes pushed from the master happened there however the slave still became out of sync. We deleted the cluster multiple times and rebuilt to get it up to date, each time it would function for a small amount of time and then become out of sync. Unfortunately we didn't end up solving this and the issues became too much and gluster was dropped as a solution. – Antitribu May 22 '15 at 15:58
  • Hi, geo-replication basically use rsync to keep all replication legs updated and in sync. If you only need read-only slaves, why don't directly use rsync to keep them in sync? – shodanshok May 22 '15 at 20:35
  • We have software that was trying to read in archive logs that was been synced over, unfortunately there were a large number of files and rsync was taking too long to notice file changes. We were also trying to find something more supported than a hack of rsync/inotify. Rsyncd might have been an option but honestly bi-directional sync was a nice plus to have. Ultimately we will end up in the scenario of multiple sites and want to be able to support that. – Antitribu May 25 '15 at 11:10
  • But GlusterFS geo-replication is **not** a bidirectional sync. It is a single, one-way sync: from master to slave. – shodanshok May 25 '15 at 13:31
  • As previously mentioned I'm aware it is one way only and as said, it's a nice plus to have, not a requirement. If a simple rsync on a cronjob met the requirements it would have been used, it doesn't which is why gluster was tried. – Antitribu May 25 '15 at 16:28