11

I'm setting up my first Gluster 3.4 install and all is good up until I want to create a distributed replicated volume.

I have 4 servers 192.168.0.11, 192.168.0.12, 192.168.0.13 & 192.168.0.14.

From 192.168.0.11 I ran:

gluster peer probe 192.168.0.12
gluster peer probe 192.168.0.13
gluster peer probe 192.168.0.14

On each server I have a mounted storage volume at /export/brick1

I then ran on 192.168.0.11

gluster volume create gv0 replica2 192.168.0.11:/export/brick1 192.168.0.12:/export/brick1 192.168.0.13:/export/brick1 192.168.0.14:/export/brick1

But I get the error:

volume create: gv0: failed: Host 192.168.0.11 is not in 'Peer in Cluster' state

Sure enough if you run gluster peer status it shows 3 peers with the other connected hosts. i.e. Number of Peers: 3

Hostname: 192.168.0.12 Port: 24007 Uuid: bcea6044-f841-4465-88e4-f76a0c8d5198 State: Peer in Cluster (Connected)

Hostname: 192.168.0.13 Port: 24007 Uuid: 3b5c188e-9be8-4d0f-a7bd-b738a88f2199 State: Peer in Cluster (Connected)

Hostname: 192.168.0.14 Port: 24007 Uuid: f6f326eb-0181-4f99-8072-f27652dab064 State: Peer in Cluster (Connected)

But, from 192.168.0.12, the same command also shows 3 hosts and 192.168.0.11 is part of it. i.e.

Number of Peers: 3

Hostname: 192.168.0.11
Port: 24007
Uuid: 09a3bacb-558d-4257-8a85-ca8b56e219f2
State: Peer in Cluster (Connected)

Hostname: 192.168.0.13
Uuid: 3b5c188e-9be8-4d0f-a7bd-b738a88f2199
State: Peer in Cluster (Connected)

Hostname: 192.168.0.14
Uuid: f6f326eb-0181-4f99-8072-f27652dab064
State: Peer in Cluster (Connected)

So 192.168.0.11 is definitely part of the cluster.

The question is, why am I not able to create the volume on the first gluster server when running the gluster command. Is this normal behaviour or some sort of bug?

hookenz
  • 14,472
  • 23
  • 88
  • 143

2 Answers2

18

I was seeing an obscure error message about an unconnected socket with peer 127.0.0.1.

[2013-08-16 00:36:56.765755] W [socket.c:1494:__socket_proto_state_machine] 0-socket.management: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:1022)

It turns out the problem I was having was due to NAT. I was trying to create gluster servers that were behind a NAT device and use the public IP to resolve the names. This is just not going to work properly for the local machine.

What I had was something like the following on each node.

A hosts file containing

192.168.0.11  gluster1
192.168.0.12  gluster2
192.168.0.13  gluster3
192.168.0.14  gluster4

The fix was to remove the trusted peers first

sudo gluster peer detach gluster2
sudo gluster peer detach gluster3
sudo gluster peer detach gluster4

Then change the hosts file on each machine to be

# Gluster1
127.0.0.1     gluster1
192.168.0.12  gluster2
192.168.0.13  gluster3
192.168.0.14  gluster4


# Gluster2
192.168.0.11  gluster1
127.0.0.1     gluster2
192.168.0.13  gluster3
192.168.0.14  gluster4

etc

Then peer probe, and finally create the volume which was then successful.

I doubt that using IP addresses (the public ones) will work in this case. It should work if you use the private addresses behind your NAT. In my case, each server was behind a NAT in the AWS cloud.

Adrián Deccico
  • 623
  • 5
  • 4
hookenz
  • 14,472
  • 23
  • 88
  • 143
  • 2
    In my case I didn't have to touch 127.0.0.1, working with internal IP address was enough – arod Jul 05 '15 at 15:56
  • 1
    Stumbled upon the same incident. Solution was for me to set an entry for each node in /etc/hosts that resolved to the local lan address of the node itsself. One entry on each node lasts to make the whole gluster work using DNS and hostnames. Anyways this is the correct answer. – iDoc Jan 17 '23 at 05:44
1

Try explicitly defining the replica count as four nodes using this format: -

gluster volume create NEW-VOLNAME [stripe COUNT] [replica COUNT] [transport <tcp | rdma>] NEW-BRICK ...

I assume this pure replica and no stripe?

try this from 192.168.0.11: -

detach everything first:

sudo gluster peer detach 192.168.0.12
sudo gluster peer detach 192.168.0.13
sudo gluster peer detach 192.168.0.14

next re-add in this format

gluster volume create gv0 replica 4 transport tcp 192.168.0.11:/export/brick1 192.168.0.12:/export/brick1 192.168.0.13:/export/brick1 192.168.0.14:/export/brick1

Note I have explicitly defined this a four node replica set. also I explicitly defined the transport over tcp.

should you wish to stripe across two devices in a replica set then you would use something like this: -

gluster volume create gv0 stripe 2 replica 2 transport tcp 192.168.0.11:/export/brick1 192.168.0.12:/export/brick1 192.168.0.13:/export/brick1 192.168.0.14:/export/brick1

Keep with it, I discovered gluster recently and I am in love with this ideology for distributed filesystems.. a real piece of art.

I use gluster to provide HA redundancy to a KVM virtual datastores. magic stuff

AngryWombat
  • 499
  • 3
  • 6
  • Unfortunately I get exactly the same error. Also, when not specifying a replica count and with all volumes currently detached I get the same error. Removing 192.168.0.11 brick it then claims the 192.168.0.12 host is not in the peer in cluster status so you do need to probe them first. At least this is the case in version 3.4 – hookenz Aug 15 '13 at 22:49
  • You might be right in suggesting its just a quirke in the latest version. that fact that your are setting all three peers in the set indicates the brinks are all working regardless of there errors set on 192.168.0.11. What happens when you mount the share from a fifth test node and write to the glusterFS. does the write appear on all bricks? – AngryWombat Aug 15 '13 at 23:48
  • Actually I can't even create a normal distributed volume on a single brick. I just get an error that says it couldn't be created. The logs have fruitless information. It makes me feel like throwing it out altogether. – hookenz Aug 15 '13 at 23:50
  • I had a similar issue 5 weeks ago, moving to v3.3 resolved my problem. The only other suggestion at this stage would be to consider a role back to 3.3 and retest. – AngryWombat Aug 15 '13 at 23:56
  • Also perhaps just start with the two nodes and work up from there... What distro are you using? I got mine running on buntu 12.04 with this repo : - sudo add-apt-repository ppa:semiosis/ubuntu-glusterfs-3.3 – AngryWombat Aug 15 '13 at 23:59
  • I'd been using the 3.4 repository on 12.04. I'm rolling back to the default repository and testing. I was hoping to use 3.4 as it claims a big performance boost for virtual machine images. – hookenz Aug 16 '13 at 00:03
  • also can you post your peers info from x.x.x.11 /etc/glusterd/peers/* /etc/glusterd/gluster.info /etc/glusterfs/glusterd.vol – AngryWombat Aug 16 '13 at 00:06
  • I heard the same but after helping you with this I think I will leave it for now... To be honest with you 3.3 is working a treat for me. – AngryWombat Aug 16 '13 at 00:07