Clearer understanding for glusterfs for replication

Question

On server01 I have installed and configured glusterfs-server and glusterfs-client to replicate directory /var/appdata to server02.

It seems that anything works fine, but I'm not sure that I understand the hole thing.

Directory /var/gfs_appdata is, let me say a view, on /var/appdata, that means all files which are generated in /var/appdata are replicated to server02, or must my application store all generated files into /var/gfs_appdata.
Directory /var/gfs_appdata dosen't hold any physical data.
At which time does file01 which is generated on server01 appears on server02, when does replication takes place?

On server01 the glusterfs is mounted via fstab:

/etc/glusterfs/glusterfs.vol              /var/gfs_appdata/  glusterfs  defaults  0  0

On server01 and server02 glusterfs-server is automatically started at boot-time with /etc/glusterfs/glusterfsd.vol:

volume posix1
  type storage/posix
  option directory /var/appdata
end-volume

volume locks1
    type features/locks
    subvolumes posix1
end-volume

volume brick1
    type performance/io-threads
    option thread-count 8
    subvolumes locks1
end-volume

volume server-tcp
    type protocol/server
    option transport-type tcp
    option auth.addr.brick1.allow *
    option transport.socket.listen-port 6996
    option transport.socket.nodelay on
    subvolumes brick1
end-volume

/etc/glusterfs/glusterfs.vol:

# RAID 1
# TRANSPORT-TYPE tcp
volume data01
    type protocol/client
    option transport-type tcp
    option remote-host 192.168.0.1
    option transport.socket.nodelay on
    option remote-port 6996
    option remote-subvolume brick1
end-volume

volume data02
    type protocol/client
    option transport-type tcp
    option remote-host 192.168.0.2
    option transport.socket.nodelay on
    option remote-port 6996
    option remote-subvolume brick1
end-volume

volume mirror-0
    type cluster/replicate
    subvolumes data01 data02
end-volume

volume readahead
    type performance/read-ahead
    option page-count 4
    subvolumes mirror-0
end-volume

volume iocache
    type performance/io-cache
    option cache-size `echo $(( $(grep 'MemTotal' /proc/meminfo | sed 's/[^0-9]//g') / 5120 ))`MB
    option cache-timeout 1
    subvolumes readahead
end-volume

volume quickread
    type performance/quick-read
    option cache-timeout 1
    option max-file-size 64kB
    subvolumes iocache
end-volume

volume writebehind
    type performance/write-behind
    option cache-size 4MB
    subvolumes quickread
end-volume

volume statprefetch
    type performance/stat-prefetch
    subvolumes writebehind
end-volume

score 1 · Accepted Answer · edited Apr 13 '17 at 12:14

OK, looks like this is the actual question:

At which time does file01 which is generated on server01 appears on server02, when does replication takes place?

The replication starts as soon as the file is created/changed/deleted on server01. Exactly how long replication takes to complete depends on the storage I/O, network bandwidth and how much data needs to be replicated.

The way I've used glusterfs, the files that live in the gluster volume are usually small so replicating a new file is almost instantaneous.

Update: As for whether you should write directly to the brick (/var/appdata) or the mount (/var/gfs_appdata), the way I understand it, you should always use the mount to read and write. Honestly, I don't know the details of exactly why this is the case, a (now ex-) colleague did a lot of testing with glusterfs about a year ago before we started using it and I haven't acquainted myself with the finer details.

Here's an answer on a similar question which gives a bit of detail explaining why it should be done that way: Can Apache Read The GlusterFS Brick Directly But Write To The GlusterFS Mount?

Thanks, can you also answer the question, in which folder the application have to store the files? — Alex, Sep 03 '10 at 13:45

Clearer understanding for glusterfs for replication

1 Answers1

Linked