3

I have created a clustered and replicated file system across 2 nodes in AWS EC2 using the following link as a guide:

http://www.gluster.org/category/aws-en/

  • I am using 2 nodes in AWS EC2
  • I am using an Ubuntu 13.10 (Saucy)
  • Have installed glusterfs-server from ppa:semiosis/ubuntu-glusterfs-3.4 repo

It installs and configures very easily and works great - until I reboot any node. Once I set all this up I reboot a single node just to verify everything comes back up but it never does. It only works after installing and configuring without rebooting. Once I reboot glusterfs-server will not start and I have to recreate the entire instance from scratch.

I've poured over the logs in /var/log/glusterfs, run glusterd in foreground mode, etc. I am not getting any answers that jump out at me. There are errors displayed, but Google isn't much assistance. Here's the output of running glusterd in the foreground:

root@aws:/var/log/glusterfs# /usr/sbin/glusterd -N -p /var/run/glusterd.pid
librdmacm: couldn't read ABI version.
librdmacm: assuming: 4
CMA: unable to get RDMA device list

The error log captures a struggled startup that ultimately ends in shut down but I have not been able to determine a cause or solution:

[2014-04-16 19:58:09.925937] E [glusterd-store.c:2487:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
[2014-04-16 19:58:09.925968] E [xlator.c:390:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2014-04-16 19:58:09.926003] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed
[2014-04-16 19:58:09.926019] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
[2014-04-16 19:58:09.926392] W [glusterfsd.c:1002:cleanup_and_exit] (-->/usr/sbin/glusterd(main+0x3df) [0x7f801961d8df] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb0) [0x7f80196206e0] (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x103) [0x7f80196205f3]))) 0-: received signum (0), shutting down
[2014-04-16 20:40:20.992287] I [glusterfsd.c:1910:main] 0-/usr/sbin/glusterd: Started running /usr/sbin/glusterd version 3.4.3 (/usr/sbin/glusterd -N -p /var/run/glusterd.pid)
[2014-04-16 20:40:20.996223] I [glusterd.c:961:init] 0-management: Using /var/lib/glusterd as working directory
[2014-04-16 20:40:20.997685] I [socket.c:3480:socket_init] 0-socket.management: SSL support is NOT enabled
[2014-04-16 20:40:20.997713] I [socket.c:3495:socket_init] 0-socket.management: using system polling thread
[2014-04-16 20:40:20.999231] W [rdma.c:4197:__gf_rdma_ctx_create] 0-rpc-transport/rdma: rdma_cm event channel creation failed (No such device)
[2014-04-16 20:40:20.999268] E [rdma.c:4485:init] 0-rdma.management: Failed to initialize IB Device
[2014-04-16 20:40:20.999284] E [rpc-transport.c:320:rpc_transport_load] 0-rpc-transport: 'rdma' initialization failed
[2014-04-16 20:40:20.999435] W [rpcsvc.c:1389:rpcsvc_transport_create] 0-rpc-service: cannot create listener, initing the transport failed
[2014-04-16 20:40:23.858537] I [glusterd-store.c:1339:glusterd_restore_op_version] 0-glusterd: retrieved op-version: 2
[2014-04-16 20:40:23.869829] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-0
[2014-04-16 20:40:23.869880] E [glusterd-store.c:1858:glusterd_store_retrieve_volume] 0-: Unknown key: brick-1
[2014-04-16 20:40:25.611295] E [glusterd-utils.c:4990:glusterd_friend_find_by_hostname] 0-management: error in getaddrinfo: Name or service not known
[2014-04-16 20:40:25.612154] E [glusterd-utils.c:284:glusterd_is_local_addr] 0-management: error in getaddrinfo: Name or service not known
[2014-04-16 20:40:25.612190] E [glusterd-store.c:2487:glusterd_resolve_all_bricks] 0-glusterd: resolve brick failed in restore
[2014-04-16 20:40:25.612221] E [xlator.c:390:xlator_init] 0-management: Initialization of volume 'management' failed, review your volfile again
[2014-04-16 20:40:25.612239] E [graph.c:292:glusterfs_graph_init] 0-management: initializing translator failed
[2014-04-16 20:40:25.612254] E [graph.c:479:glusterfs_graph_activate] 0-graph: init failed
[2014-04-16 20:40:25.612628] W [glusterfsd.c:1002:cleanup_and_exit] (-->/usr/sbin/glusterd(main+0x3df) [0x7fef3d7c58df] (-->/usr/sbin/glusterd(glusterfs_volumes_init+0xb0) [0x7fef3d7c86e0] (-->/usr/sbin/glusterd(glusterfs_process_volfp+0x103) [0x7fef3d7c85f3]))) 0-: received signum (0), shutting down

I found one thread on the gluster-user list that matches up but it goes unresolved:

http://www.gluster.org/pipermail/gluster-users/2013-October/037687.html

If anyone can provide any wisdom - it would be much appreciated.

jriffel73
  • 128
  • 1
  • 9

2 Answers2

1

Try stopping the volume:

gluster volume stop <volume name>

Then restarting with the "force" command to rebuild metadata on a per brick basis:

gluster volume start <volume name> force
ajd
  • 19
  • 4
  • 3
    You can't issue gluster commands if the service is down as in the case of the OP. If DNS changes you can manually replace DNS or IP information within the gluster metadata but I'd highly recommend backing up the /var/lib/glusterfs directory first: https://www.gluster.org/pipermail/gluster-users/2015-June/022264.html – DevOops Jan 06 '17 at 16:46
0

For future reference - I was not using the fully qualified domain name of the peer connection. I was using only the host name and I had revised /etc/resolv.conf search our DNS suffix. Upon reboot resolv.conf is rewritten by the DHCP client - thus breaking the DNS resolution of the peers. Apparently, if the DNS names do not resolve at all the services will not even start - which could be considered a bug. I think the services should always start regardless.

jriffel73
  • 128
  • 1
  • 9