We are using two servers separated by a WAN to replicate approximately 1TB of data.
On the master side we have a single server with a Gluster volume exported to a number of other servers that write in data.
On the slave side we have a single server with a Gluster volume exported as a read only share to disaster recovery servers.
Over time the slave has become out of sync with the master to the tune of 200gb, files that should be there aren't and files that have been deleted are. There does not appear to be a great deal of consistency in this.
What is the simplest way to force cluster to checksum every file on the slave and re-replicate where required?
The documentation suggests:
Description: GlusterFS Geo-replication did not synchronize the data completely but still the geo-replication status display OK.
Solution: You can enforce a full sync of the data by erasing the index and restarting GlusterFS Geo-replication. After restarting, GlusterFS Geo-replication begins synchronizing all the data, that is, all files will be compared with by means of being checksummed, which can be a lengthy /resource high utilization operation, mainly on large data sets (however, actual data loss will not occur). If the error situation persists, contact Gluster Support.
But does not refer to where this index may be.
# gluster volume geo-replication share gluk1::share stop
Stopping geo-replication session between share & gluk1::share has been successful
# gluster volume set share geo-replication.indexing off
volume set: failed: geo-replication.indexing cannot be disabled while geo-replication sessions exist
This index shutoff fails while the connection still exists at all and the documentation doesn't mention this requirement.
Any suggestions?