I have a two-node GlusterFS setup. Each one has 2 replicates. One of the system was overloaded somehow. Then things started to go wrong. Currently I have all application shutdown. I'm short of idea how to bring it back. I can start the volume, but some files seems to be corrupted.
I ran gluster volume heal kvm1
,
Now gluster volume heal kvm1 info
shows a long list of "gfid", such as
<gfid:57d68ac5-5ae7-4d14-a65e-9b6bbe0f83a3>
<gfid:c725a364-93c5-4d98-9887-bc970412f124>
<gfid:8178c200-4c9a-407b-8954-08042e45bfce>
<gfid:b28866fa-6d29-4d2d-9f71-571a7f0403bd>
I'm not sure it is actually 'healing' anything. The number of entries has been steady. How can I confirm the healing process is actually working?
# gluster volume heal kvm1 info|egrep 'Brick|entries'
Brick f24p:/data/glusterfs/kvm1/brick1/brick
Number of entries: 5
Brick f23p:/data/glusterfs/kvm1/brick1/brick
Number of entries: 216
Brick f23p:/bricks/brick1/kvm1
Number of entries: 6
Brick f24p:/bricks/brick2/kvm1
Number of entries: 1
# gluster volume status
Status of volume: kvm1
Gluster process Port Online Pid
------------------------------------------------------------------------------
Brick f24p:/data/glusterfs/kvm1/brick1/brick 49160 Y 5937
Brick f23p:/data/glusterfs/kvm1/brick1/brick 49153 Y 5766
Brick f23p:/bricks/brick1/kvm1 49154 Y 5770
Brick f24p:/bricks/brick2/kvm1 49161 Y 5941
NFS Server on localhost 2049 Y 5785
Self-heal Daemon on localhost N/A Y 5789
NFS Server on f24p 2049 Y 5919
Self-heal Daemon on f24p N/A Y 5923
There are no active volume tasks