GlusterFS v3.10 not mounting on boot Centos 7.3-1611

Question

FIXED for me at least. And I have no idea how. I've run through the logs from when it wasn't working to now when it is and I cannot for the life of my see any different. What is different and I don't know if it's a coincidence is when I execute

gluster volume status

On both nodes they both say Task Status of Volume glustervol1 where as before in server2 it was the hostname of the box. I have no idea how that happened. But it did... Don't know if that fixed it or what but it did it on it's own after numerous reboots.

Good luck.

STILL?! There's a lot of writing on this from 2014 ish on ubuntu and 14.04 using init. I'm running centos 7.3-1611 fully patched with kernel 3.10.0-514.10.2.el7, and gluster volumes still don't mount after reboot on servers where the lvm bricks and the client vol mount are on the same server.

I have 3 boxes

server1 (server peer1) and client
server2: (server peer2) and client
server3: client only

They are using lvm backend. And the glustervol should mount to /data/glusterfs. The issue isn't present on server3 where it's only a client. It connects and mounts using the same rules as the other servers. I've dug into the data logs, into selinux into the start up log. I can't find a way around it. I've considered CTBD and tried autofs to no avail.

gluster version

glusterfs 3.10.0 Repository revision: git://git.gluster.org/glusterfs.git Copyright (c) 2006-2016 Red Hat, Inc. <https://www.gluster.org/> GlusterFS comes with ABSOLUTELY NO WARRANTY. It is licensed to you under your choice of the GNU Lesser General Public License, version 3 or any later version (LGPLv3 or later), or the GNU General Public License, version 2 (GPLv2), in all cases as published by the Free Software Foundation.

fstab

/dev/vg_gluster/brick1 /data/bricks/brick1 xfs defaults 0 0 gluster1:/glustervol1 /data/glusterfs glusterfs defaults,_netdev 0 0

What's expected

sdb LVM2_member 6QrvQI-v5L9-bds3-BUn0-ySdB-hDmz-nVojpX └─vg_gluster-brick1 xfs d181747c-8ed3-430c-bd1c-0b7968666dfe /data/bricks/brick1 and gluster1:/glustervol1 49G 33M 49G 1% /data/glusterfs

This works by running the manual mount -t glusterfs... or by executing mount -a with the rules in my fstab. But it will not work on boot. I've read that it's something to do with the mounts trying to happen before the daemon started. What is the best workaround for this? Is it to edit systemd files? Does anyone know a fix?

This is a snippet of a fresh boot while trying to mount through fstab where it's saying that there is no brick process running.

[2017-04-03 16:35:47.353523] I [MSGID: 100030] [glusterfsd.c:2460:main] 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.0 (args: /usr/sbin/glusterfs --volfile-server=gluster1 --volfile-id=/glustervol1 /data/glusterfs) [2017-04-03 16:35:47.456915] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 1 [2017-04-03 16:35:48.711381] I [afr.c:94:fix_quorum_options] 0-glustervol1-replicate-0: reindeer: incoming qtype = none [2017-04-03 16:35:48.711398] I [afr.c:116:fix_quorum_options] 0-glustervol1-replicate-0: reindeer: quorum_count = 0 [2017-04-03 16:35:48.712437] I [socket.c:4120:socket_init] 0-glustervol1-client-1: SSL support on the I/O path is ENABLED [2017-04-03 16:35:48.712451] I [socket.c:4140:socket_init] 0-glustervol1-client-1: using private polling thread [2017-04-03 16:35:48.712892] E [socket.c:4201:socket_init] 0-glustervol1-client-1: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled [2017-04-03 16:35:48.713139] I [MSGID: 101190] [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 [2017-04-03 16:35:48.759228] I [socket.c:4120:socket_init] 0-glustervol1-client-0: SSL support on the I/O path is ENABLED [2017-04-03 16:35:48.759243] I [socket.c:4140:socket_init] 0-glustervol1-client-0: using private polling thread [2017-04-03 16:35:48.759308] E [socket.c:4201:socket_init] 0-glustervol1-client-0: failed to open /etc/ssl/dhparam.pem, DH ciphers are disabled [2017-04-03 16:35:48.759596] W [MSGID: 101174] [graph.c:361:_log_if_unknown_option] 0-glustervol1-readdir-ahead: option 'parallel-readdir' is not recognized [2017-04-03 16:35:48.759680] I [MSGID: 114020] [client.c:2352:notify] 0-glustervol1-client-0: parent translators are ready, attempting connect on transport [2017-04-03 16:35:48.762408] I [MSGID: 114020] [client.c:2352:notify] 0-glustervol1-client-1: parent translators are ready, attempting connect on transport [2017-04-03 16:35:48.904234] E [MSGID: 114058] [client-handshake.c:1538:client_query_portmap_cbk] 0-glustervol1-client-0: failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running. [2017-04-03 16:35:48.904286] I [MSGID: 114018] [client.c:2276:client_rpc_notify] 0-glustervol1-client-0: disconnected from glustervol1-client-0. Client process will keep trying to connect to glusterd until brick's port is available Final graph: +------------------------------------------------------------------------------+ 1: volume glustervol1-client-0 2: type protocol/client 3: option ping-timeout 42 4: option remote-host gluster1 5: option remote-subvolume /data/bricks/brick1/brick 6: option transport-type socket 7: option transport.address-family inet 8: option username xxx 9: option password xxx 10: option transport.socket.ssl-enabled on 11: option send-gids true 12: end-volume 13: 14: volume glustervol1-client-1 15: type protocol/client 16: option ping-timeout 42 17: option remote-host gluster2 18: option remote-subvolume /data/bricks/brick1/brick 19: option transport-type socket 20: option transport.address-family inet 21: option username xxx 22: option password xxx 23: option transport.socket.ssl-enabled on 24: option send-gids true 25: end-volume 26: 27: volume glustervol1-replicate-0 28: type cluster/replicate 29: option afr-pending-xattr glustervol1-client-0,glustervol1-client-1 30: option use-compound-fops off 31: subvolumes glustervol1-client-0 glustervol1-client-1 32: end-volume 33: 34: volume glustervol1-dht 35: type cluster/distribute 36: option lock-migration off 37: subvolumes glustervol1-replicate-0 38: end-volume 39: 40: volume glustervol1-write-behind 41: type performance/write-behind 42: subvolumes glustervol1-dht 43: end-volume 44: 45: volume glustervol1-read-ahead 46: type performance/read-ahead 47: subvolumes glustervol1-write-behind 48: end-volume 49: 50: volume glustervol1-readdir-ahead 51: type performance/readdir-ahead 52: option parallel-readdir off 53: option rda-request-size 131072 54: option rda-cache-limit 10MB 55: subvolumes glustervol1-read-ahead 56: end-volume 57: 58: volume glustervol1-io-cache 59: type performance/io-cache 60: subvolumes glustervol1-readdir-ahead 61: end-volume 62: 63: volume glustervol1-quick-read 64: type performance/quick-read 65: subvolumes glustervol1-io-cache 66: end-volume 67: 68: volume glustervol1-open-behind 69: type performance/open-behind 70: subvolumes glustervol1-quick-read 71: end-volume 72: 73: volume glustervol1-md-cache 74: type performance/md-cache 75: subvolumes glustervol1-open-behind 76: end-volume 77: 78: volume glustervol1 79: type debug/io-stats 80: option log-level INFO 81: option latency-measurement off 82: option count-fop-hits off 83: subvolumes glustervol1-md-cache 84: end-volume 85: 86: volume meta-autoload 87: type meta 88: subvolumes glustervol1 89: end-volume 90: +------------------------------------------------------------------------------+ [2017-04-03 16:35:48.949500] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 0-glustervol1-client-1: changing port to 49152 (from 0) [2017-04-03 16:35:49.105087] I [socket.c:348:ssl_setup_connection] 0-glustervol1-client-1: peer CN = <name> [2017-04-03 16:35:49.105103] I [socket.c:351:ssl_setup_connection] 0-glustervol1-client-1: SSL verification succeeded (client: <ip>:24007) [2017-04-03 16:35:49.106999] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-glustervol1-client-1: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-04-03 16:35:49.109591] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-glustervol1-client-1: Connected to glustervol1-client-1, attached to remote volume '/data/bricks/brick1/brick'. [2017-04-03 16:35:49.109609] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-glustervol1-client-1: Server and Client lk-version numbers are not same, reopening the fds [2017-04-03 16:35:49.109713] I [MSGID: 108005] [afr-common.c:4756:afr_notify] 0-glustervol1-replicate-0: Subvolume 'glustervol1-client-1' came back up; going online. [2017-04-03 16:35:49.110987] I [fuse-bridge.c:4146:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.24 kernel 7.22 [2017-04-03 16:35:49.111004] I [fuse-bridge.c:4831:fuse_graph_sync] 0-fuse: switched to graph 0 [2017-04-03 16:35:49.112283] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-glustervol1-client-1: Server lk version = 1 [2017-04-03 16:35:52.547781] I [rpc-clnt.c:1964:rpc_clnt_reconfig] 0-glustervol1-client-0: changing port to 49152 (from 0) [2017-04-03 16:35:52.558003] I [socket.c:348:ssl_setup_connection] 0-glustervol1-client-0: peer CN = <name> [2017-04-03 16:35:52.558015] I [socket.c:351:ssl_setup_connection] 0-glustervol1-client-0: SSL verification succeeded (client: <ip>:24007) [2017-04-03 16:35:52.558167] I [MSGID: 114057] [client-handshake.c:1451:select_server_supported_programs] 0-glustervol1-client-0: Using Program GlusterFS 3.3, Num (1298437), Version (330) [2017-04-03 16:35:52.558592] I [MSGID: 114046] [client-handshake.c:1216:client_setvolume_cbk] 0-glustervol1-client-0: Connected to glustervol1-client-0, attached to remote volume '/data/bricks/brick1/brick'. [2017-04-03 16:35:52.558604] I [MSGID: 114047] [client-handshake.c:1227:client_setvolume_cbk] 0-glustervol1-client-0: Server and Client lk-version numbers are not same, reopening the fds [2017-04-03 16:35:52.558781] I [MSGID: 114035] [client-handshake.c:202:client_set_lk_version_cbk] 0-glustervol1-client-0: Server lk version = 1

This has literally just fixed itself... after 5 hours so hair pulling now they mount on reboot. I changed nothing. FML. — b0bu, Apr 03 '17 at 16:57
^ LOL Seriously, though I've had this same issue when using server nodes as clients. It's a race condition, and the fstab is pretty stupid. I solved the issue by creating a systemd unit file for mounting glusterfs via FUSE that called a dependency on the volume daemon being started first. — Spooler, Apr 03 '17 at 17:24
One could also get away with using the automounter for this. In fact, since it's a network filesystem, you probably should be using autofs. — Spooler, Apr 03 '17 at 17:25
The initial idea was "does it work" after all this trouble It feels a bit sketchy for our production use case. We have Ceph, but to get this in place took less than an hour and then a few more of me screaming profanities. — b0bu, Apr 04 '17 at 13:44

Levin · Answer 1 · 2018-05-09T05:26:31.813

1st post the recommended solution

Perhaps you could try

ip:/volume /dir glusterfs defaults,noauto,x-systemd.automount,x-systemd.device-timeout=30,_netdev 0 0.

refer archwiki-fstab#remote filesystem

Beacuse my OS is Cent6.9 without systemd, so it does not work for me.(maybe there are some options for init, please tell me if you know :) )

2nd problem description

I had added rule in fstab, but glusterfs could not be automatically mounted after booted. Ver 3.10.

I execute command mount -a, the filesystem could be mounted.

Saw log file /etc/log/boot.log, it find that filesystem was mounted failed.

Saw log file /var/log/gluster/<your gluster volume name>.log, it said that connected gluster server failed (but ping server, it is fine).

I think that maybe network was not ready while mounting?

3th my inelegant solution

I search many issues, blogs or forums, the problem was not be solved...

In the end, I given up , and added a command in /etc/rc.local

sleep 30s
mount -a

This solution is ugly(maybe), but the world will be beautiful again after system reboot.

score -1 · Answer 2 · edited Jul 02 '17 at 12:33

-1

#gluster volume set glustervol1 performance.cache-size 32MB

set your read cache low memory

edited Jul 02 '17 at 12:33

Jenny D

27,780
21
75
114

answered Jul 02 '17 at 07:47

xiaorui2004

1

3

this needs a lot more details. – Sven Jul 02 '17 at 08:49