1

I have a GlusterFS (3.12.1) cluster of 3 nodes.

Setp 1: removed a node (node2)

from node1

# gluster volume remove-brick swarm-data replica 2 node2:/glusterfs/swarm-data force  
# gluster peer detach node2  

Setp 2: clear node

from node2

# rm -rf /glusterfs/swarm-data  
# mkdir /glusterfs/swarm-data

And maintenance job

Setp 3: re-add node

from node1

# gluster peer probe node2  
# gluster volume add-brick swarm-data replica 3 node2:/glusterfs/swarm-data force
volume add-brick: failed: Commit failed on node2. Please check log
file for details.

show logs :

failed to get the port number for remote subvolume. Please run 'gluster volume status' on server to see if brick process is running.
E [MSGID: 108006] [afr-common.c:5001:__afr_handle_child_down_event] 0-swarm-data-replicate-0: All subvolumes are down. Going offline until atleast one of them comes back up.

next:

# gluster volume status
Status of volume: swarm-data
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick node1:/glusterfs/swarm
-data                                       49152     0          Y       31216
Brick node3:/glusterfs/swarm
-data                                       49152     0          Y       2373 
Brick node2:/glusterfs/swarm
-data                                       N/A       N/A        N       N/A  
Self-heal Daemon on localhost               N/A       N/A        Y       27293
Self-heal Daemon on node3    N/A       N/A        Y       20268
Self-heal Daemon on node2    N/A       N/A        Y       7568 

Task Status of Volume swarm-data
------------------------------------------------------------------------------
There are no active volume tasks

=> TCP Port : N/A for Node2 !!

Next:

# gluster volume info swarm-data

Volume Name: swarm-data
Type: Replicate
Volume ID: 0edd8275-8d39-4e95-abc8-9f028c2098a7
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: node1:/glusterfs/swarm-data
Brick2: node3:/glusterfs/swarm-data
Brick3: node2:/glusterfs/swarm-data
Options Reconfigured:
auth.allow: 127.0.0.1
transport.address-family: inet
nfs.disable: on

Node2 is here! but no data sync

nodes 1 and 3 have port 49152 listening but not node 2 with:

netstat -an | grep LISTEN

Can you help me ?

user2452092
  • 11
  • 1
  • 2

2 Answers2

1

Check the resolution name ( DNS or hosts file ). Check the glusterd services on node2 is started or not. If glusterd not start post the log.

  • Thank you for your answer. It's ok, I can create another volume on my 3 nodes. But still not for my first volume. I did not try to stop the volume before adding a node (I'm on a server in production). – user2452092 Nov 28 '17 at 12:22
  • Yes. Glustrfs is ha and fail over storage system: you do not stop services. If you create a new volume you must migrates all data from old volume to new volume. Another trick try to check var/lib/gluster if you have old config. – Alessandro Secchi Nov 28 '17 at 12:47
1

I have occasionally encountered this situation, too. You could try to restart GlusterFS service later and check it again:

systemctl restart glusterd
gluster volume status
i_chips
  • 41
  • 2