Gluserfs heal info shows gfid's only

Question

I have a two-node GlusterFS setup. Each one has 2 replicates. One of the system was overloaded somehow. Then things started to go wrong. Currently I have all application shutdown. I'm short of idea how to bring it back. I can start the volume, but some files seems to be corrupted.

I ran gluster volume heal kvm1, Now gluster volume heal kvm1 info shows a long list of "gfid", such as

<gfid:57d68ac5-5ae7-4d14-a65e-9b6bbe0f83a3>
<gfid:c725a364-93c5-4d98-9887-bc970412f124>
<gfid:8178c200-4c9a-407b-8954-08042e45bfce>
<gfid:b28866fa-6d29-4d2d-9f71-571a7f0403bd>

I'm not sure it is actually 'healing' anything. The number of entries has been steady. How can I confirm the healing process is actually working?

# gluster volume heal kvm1 info|egrep 'Brick|entries'
Brick f24p:/data/glusterfs/kvm1/brick1/brick
Number of entries: 5
Brick f23p:/data/glusterfs/kvm1/brick1/brick
Number of entries: 216
Brick f23p:/bricks/brick1/kvm1
Number of entries: 6
Brick f24p:/bricks/brick2/kvm1
Number of entries: 1

# gluster volume status
Status of volume: kvm1
Gluster process                                         Port    Online  Pid
------------------------------------------------------------------------------
Brick f24p:/data/glusterfs/kvm1/brick1/brick       49160   Y       5937
Brick f23p:/data/glusterfs/kvm1/brick1/brick       49153   Y       5766
Brick f23p:/bricks/brick1/kvm1                     49154   Y       5770
Brick f24p:/bricks/brick2/kvm1                     49161   Y       5941
NFS Server on localhost                            2049    Y       5785
Self-heal Daemon on localhost                      N/A     Y       5789
NFS Server on f24p                                 2049    Y       5919
Self-heal Daemon on f24p                           N/A     Y       5923

There are no active volume tasks

score 2 · Answer 1 · answered Dec 01 '15 at 14:42

I was in the same state:

2 replicates
gluster volume heal myVolume info was showing gfid on one of the bricks

I found this script (resolves gfid into filepath) https://gist.github.com/semiosis/4392640

My interpretation is the folowing (ie: your first line of gfid) On the node displaying the gfid (result of the gluster command)

The file %yourBrickPath%/.glusterfs/57/d6/57d68ac5-5ae7-4d14-a65e-9b6bbe0f83a3 is a hard link pointing to an inode.

In a normal situation you should have a file (in your production directory) pointing the the same inode and for some reason, this hard link is not present anymore.

I see 2 solutions:

You recreate the missing hardlink in your production directory (and you make sure to have the same state on the other node)
You have no way to find out what filename it was (it was my case as nothing was on the other node) and you remove %yourBrickPath%/.glusterfs/57/d6/57d68ac5-5ae7-4d14-a65e-9b6bbe0f83a3

edit: The content of the file might help

score 0 · Answer 2 · answered Jan 05 '17 at 10:54

You may have stumbled on this bug if you are running version < 3.7.7:

https://bugzilla.redhat.com/show_bug.cgi?id=1284863

Check if any of your glustershd logs show "Couldn't get xlator xl-0".

The fix is in 3.7.7. However, workarounds would be great if anyone finds such.

Gluserfs heal info shows gfid's only

2 Answers2