1

I am having almost identical issue, while so can provide more details on how the set up is:

2x server replica 2 gluster volume from two bricks. Brick IMG-01:/images/storage/brick1 49152 0 Y
3497 Brick IMG-02:/images/storage/brick1 49152 0
Y 3512 NFS Server on localhost N/A
N/A N N/A Self-heal Daemon on localhost
N/A N/A Y 3490 NFS Server on IMG-02
N/A N/A N N/A Self-heal Daemon on IMG-02
N/A N/A Y 3505 Task Status of Volume gv1 ------------------------------------------------------------------------------ There are no active volume tasks

To allow the HA I did this from the Gluster-clients side:

   IMG-01:/gv1  /mnt/glustervol1 glusterfs  _netdev,backupvolfile-server=IMG-02,direct-io-mode=disable,log-level=WARNING,log-file=/var/log/gluster.log  0    0

Glusterfs-server version is 3.7 on Ubuntu 16.04 and clients are glusterfs 3.8 on ubuntu 14.0.4 Gluster servers are communicating through infiniband direct connection and /30 subnet; while the clients are connecting through 1G Ethernet interface.

Now times that one of the servers are out for any reason say a reboot or service unavailability the clients maintain connections but fail to read or write and eventually the clients freez as well. If the servers are replica of each other and if th

h.safe
  • 131
  • 1
  • 7

1 Answers1

0

Clarification and possible explaination which could be an answer to the above question: a. 2x replica storage can be in fact provide HA if your files like mine is non-editable nature i.e images...so in case of a failure on the main glusterfs storage node the secondary will serve and will accept the writes...upon the availability of the main gluster server it does the self-heal and can be in service. b. My case there was an underlying culprit being a huge number of image files written to a folder by an app i.e +500,000 images in couple of days without really managing the structure and hiearchy...this eventually lead to 2x servers inability to sync with each other and eventually break the service response. By fixing the directory storage and creating sub-directories we fixed it.

h.safe
  • 131
  • 1
  • 7