0

My setup:

3 machines, 2 client, 1 server. The server is redhat 7.6, the clients are ubuntu server 18.04.2.

The mount is a few dozen TB, the files placed on the clients are a few 100Gigs each.

The behavior I'm seeing is as follows:

A file gets placed on machine1. Another service then sees it appear on machine2 and tries to append something to it. However, the file on machine2 is not yet complete and this fails.

So there's some sort of latency going on there and I wonder how I should deal with that.

I'm getting the following output when checking the configuration of the mounts(anonymized):

machine1:~$ nfsstat -m
/mnt/dirA from <SERVER_IP>:/dirA
 Flags: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP>

/mnt/dirB from <SERVER_IP>:/dirB
 Flags: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP>

machine1:~$ cat /proc/mounts | grep <SERVER_IP>
<SERVER_IP>:/dirA /mnt/dirA nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP> 0 0
<SERVER_IP>:/dirB /mnt/dirB nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP> 0 0



machine2:~$ nfsstat -m
/mnt/dirA from <SERVER_IP>:/dirA
 Flags: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP>

/mnt/dirB from <SERVER_IP>:/dirB
 Flags: rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP>

<SERVER_IP>:/dirA /mnt/dirA nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP> 0 0
<SERVER_IP>:/dirB /mnt/dirB nfs4 rw,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,acregmin=1,acregmax=1,acdirmin=1,acdirmax=1,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=<LOCALHOST_IP>,local_lock=none,addr=<SERVER_IP> 0 0

Which all seems pretty normal.

My question is: how can I measure this latency, perhaps monitor it, and deal with it? Is it inevitable? Is there a way to make it so the file won't be shown until complete? Or is there a better solution I haven't considered yet?

KdgDev
  • 253
  • 1
  • 7
  • 21
  • 1
    This is not just an issue with network filesystems; it is a race condition that also happens entirely locally on a single system. The two applications involved must synchronize themselves some other way than simply checking to see if the file exists. – Michael Hampton Aug 08 '19 at 17:49
  • And in addition to what what Michael said - a common solution is locking ; one application applies a lock and the second application can only write after the first application release its (write) lock and the second has successfully claimed its own write lock – HBruijn Aug 08 '19 at 22:19
  • @MichaelHampton So my options are: get the external applications to compare a source of truth about the file in question.... or remount with sync enabled? Which I read has a big performance hit. But it would mean the server will only acknowledge data after it's written out, if I the stuff I read is correct. – KdgDev Aug 09 '19 at 08:49

0 Answers0