glusterfs mounts get unmounted when 1 of the 2 bricks goes offline

Question

I have an odd case where 1 of the 2 replicated glusterfs bricks will go offline and take all of the client mounts down with it. As I understand it, this should not be happening. It should fail over to the brick that is still online, but this hasn't been the case. I suspect that this is due to configuration issue.

Here is a description of the system:

2 gluster servers on dedicated hardware (gfs0, gfs1)
8 client servers on vms (client1, client2, client3, ... , client8)

Half of the client servers are mounted with gfs0 as the primary, and the other half are pointed at gfs1. Each of the clients are mounted with the following entry in /etc/fstab:

/etc/glusterfs/datavol.vol /data glusterfs defaults 0 0

Here is the content of /etc/glusterfs/datavol.vol:

volume datavol-client-0
    type protocol/client
    option transport-type tcp
    option remote-subvolume /data/datavol
    option remote-host gfs0
end-volume

volume datavol-client-1
    type protocol/client
    option transport-type tcp
    option remote-subvolume /data/datavol
    option remote-host gfs1
end-volume

volume datavol-replicate-0
    type cluster/replicate
    subvolumes datavol-client-0 datavol-client-1
end-volume

volume datavol-dht
    type cluster/distribute
    subvolumes datavol-replicate-0
end-volume

volume datavol-write-behind
    type performance/write-behind
    subvolumes datavol-dht
end-volume

volume datavol-read-ahead
    type performance/read-ahead
    subvolumes datavol-write-behind
end-volume

volume datavol-io-cache
    type performance/io-cache
    subvolumes datavol-read-ahead
end-volume

volume datavol-quick-read
    type performance/quick-read
    subvolumes datavol-io-cache
end-volume

volume datavol-md-cache
    type performance/md-cache
    subvolumes datavol-quick-read
end-volume

volume datavol
    type debug/io-stats
    option count-fop-hits on
    option latency-measurement on
    subvolumes datavol-md-cache
end-volume

The config above is the latest attempt at making this behave properly. I have also tried the following entry in /etc/fstab:

gfs0:/datavol /data glusterfs defaults,backupvolfile-server=gfs1 0 0

This was the entry for half of the clients, while the other half had:

gfs1:/datavol /data glusterfs defaults,backupvolfile-server=gfs0 0 0

The results were exactly the same as the above configuration. Both configs connect everything just fine, they just don't fail over.

Any help would be appreciated.

score 0 · Answer 1 · answered May 01 '14 at 19:23

0

It appears you have a 'cluster/distribute' block in your config which I would think would cause Gluster to think the volume is striped. Try removing the volume and recreating it without the "stripe" option.

answered May 01 '14 at 19:23

ColinM

701
8
19

glusterfs mounts get unmounted when 1 of the 2 bricks goes offline

1 Answers1