Identifying cause of GlusterFS data corruption

Question

I have been experiencing data corruption when writing data to a replicated GlusterFS volume I have configured across two servers.

The configuration I have set up is as follows:

Servers are running Ubuntu 16.04 and GlusterFS v3.10.6
Clients are running Ubuntu 14.04 and GlusterFS v3.10.6
Two volumes in GlusterFS have been configured, each with two bricks distributed with one brick on each server.
Each brick is a MDADM RAID5 array with a EXT4/LUKS file system.

Each volume is configured with the default options, plus bitrot detection. These are as follows:

features.scrub: Active
features.bitrot: on
features.inode-quota: on
features.quota: on
nfs.disable: on

The data corruption manifests it's self when large directories are copied from the local file system on one of the client machines to either of the configured GlusterFS volumes. When md5 checksums are calculated for the copied files and source files and the two are compared, a number of the checksums differ.

Manually triggering a self-heal on either GlusterFS volumes shows no files identified for healing. Additionally, looking at the output from gluster volume bitrot <volname> scrub status and the output logs in /var/log/glusterfs/bitd.log and /var/log/glusterfs/scrub.log don't seem to identify any errors.

These issues have only manifested themselves, recently after around a week of both volumes being used fairly heavily by ~10 clients.

I have tried taking the volumes offline and have tested writing data to each of the bricks directly via the underlying local file system and haven't been able to reproduce the issues.

To further debug the issue I have configured a similar setup on VMs in VirtualBox and haven't been able to reproduce the problem. I am therefor at rather a loss as to what may be these cause of these errors.

Any suggestions of further debugging steps I could take or known issues with GlusterFS and my configuration would be appreciated.

score 2 · Answer 1 · answered Mar 16 '18 at 09:32

After being unable to get GlusterFS to behave properly I decided to move my setup to NFS, with a live master and a mirror synced every hour or so to provide a degree of fail over in the event of the main server going down.

Recently we were performing maintenance on the server providing the mirror and it turned out that we were having similar issues with data corruption over NFS on that server.

After much debugging of the possible causes of the corruption we eventually tracked it down to hardware offloading to the network interface, after I noticed we were also occasionally getting Disconnecting: Packet corrupt errors with large packets over SSH.

Looking into possible causes of the SSH errors, I found the following Unix & Linux question: packet_write_wait Broken pipe even leaving top running?

Some of the discussion on this thread suggested a buggy network interface driver could potentially lead to packet corruption when segmentation and rx/tx checksumming is passed off to the interface.

After disabling the rx/tx and segmentation offloading (following the instructions in the following blog post: How to solve ssh disconnect packet corrupt problems) and testing the server under heavy network load I found that SSH errors and data corruption over NFS went away.

Since I no longer have GlusterFS configured on the servers, I am unable to verify this was the cause of the data corruption we experienced. However, given the issue persisted on one of the servers after we moved to NFS it is likely that this may have been the cause of our problems.

As a side note, the network interface driver was using the e1000e driver. Subsequently I have found the following discussion on the RHEL bug tracker: Bug 504811 - e1000 silently corrupting data which suggests that packet corruption is possible as a result of hardware offloading to a network interface such as certain cards using the e1000e driver.

score 0 · Answer 2 · answered Nov 01 '17 at 19:49

0

If Gluster says there's no corruption, there likely isn't any detectable corruption on your volumes. However, from what you describe there are no data replicas on these gluster volumes beyond 1. Without multiple repicas (ideally three full or 2n+a), we can't determine if a node has corrupted its data as it has no other replica to compare itself to.

One way around this is to enable the bitrot detection daemon, which is disabled by default. This will allow for data scrubbing using file checksums. This can be done using gluster volume bitrot VOLNAME enable. Detected errors are logged in /var/log/glusterfs/bitd.log and /var/log/glusterfs/scrub.log

None of this accounts for for in-flight corruption.

You might check the clients themselves if nothing above turns up anything, and any relevant logs from both the client and server. You also may have to test your network, client, or server hardware along this path in order to determine exactly where this corruption is occurring. Hopefully you don't have to go that far.

answered Nov 01 '17 at 19:49

Spooler

7,046
18
29

Thanks. I've already got bitrot detection enabled and I can see from the log files that the daemon is performing checking. There is a chance corruption is happening in flight though. – PicoutputCls Nov 02 '17 at 09:01
Does this happen to all of your clients, or just one? Also, is everything using ecc memory? – Spooler Nov 02 '17 at 17:44
It happens to all the clients. Both servers have ECC memory but the clients do not. – PicoutputCls Nov 03 '17 at 08:53
Is it possible for Gluster to report messages like "Disconnecting: Packet corrupt" when TSO was not disabled with ethtool? See the other answer. – user2987828 Jan 08 '20 at 12:11

Identifying cause of GlusterFS data corruption

2 Answers2