2

We are currently running a three node cluster, on Gluster 3.6.4.

On one of our nodes we noticed that the glusterd daemon is dead.

But the glusterfsd daemons are still running, and we believe clients are connecting and retrieving data

We noticed that the daemon has been dead for a week, and we didn't see it. The NFS distributed mounts continued to work normally

We would like to know are we safe to just go ahead and start the glusterd service again?

If so would this trigger a self-heal on all volumes? As this would cause a performance issue.

The logs for this node is as follows::

[2016-08-19 18:01:52.804453] E [rpc-clnt.c:362:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1e0)[0x7f4f3ffca550] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7f4f3fd9f787] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4f3fd9f89e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x91)[0x7f4f3fd9f951] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x15f)[0x7f4f3fd9ff1f] ))))) 0-DAOS-client-4: forced unwinding frame type(GF-DUMP) op(DUMP(1)) called at 2016-08-19 18:01:51.886737 (xid=0x144a1d)
[2016-08-19 18:01:52.804480] W [client-handshake.c:1588:client_dump_version_cbk] 0-DAOS-client-4: received RPC status error
[2016-08-19 18:01:52.804504] W [socket.c:620:__socket_rwv] 0-glusterfs: readv on 127.0.0.1:24007 failed (No data available)
[2016-08-19 18:02:02.900863] E [socket.c:2276:socket_connect_finish] 0-glusterfs: connection to 127.0.0.1:24007 failed (Connection refused)

If we aren't safe to do so, what else should we do to resolve this

( useful information: this blog entry discusses the difference between glusterfsd and glusterd http://blog.nixpanic.net/2013/12/gluster-and-not-restarting-brick.html )

Vorsprung
  • 183
  • 6

1 Answers1

1

Yes, your volumes can't self-heal without a requisite number of nodes voting on the issue. And yes, it should restart the self-healing process when you start glusterd.service. However, it will only heal files that have been marked as in need of healing.

Since you haven't noticed a lack of glusterd daemon, I'm assuming you don't modify bricks / volumes much on this cluster. However, the glusterfsd daemons are all running, meaning self-healing shouldn't be needed for the most part.

The biggest thing to consider is that self-healing is less like a patrol read and more like a selective scrub - in that it only works on files that have been tagged as dirty. With that in mind, starting the glusterd daemon isn't much of a concern.

Spooler
  • 7,046
  • 18
  • 29