NFS clients sometimes hang for 15 seconds

Question

NFS server: CentOS7.1 (kernel 3.10.0-229, nfs-utils 1.3.0)

Some clients are Ubuntu Precise (12.04) using NFSv3... they work fine.

The problematic clients are running CentOS7.1 using NFSv4.1 (or NFSv4.0). Most of them time, things work fine. But sometimes, writing a file will incur a 15 second timeout. These are small files (< 10KiB).

After 15 seconds, the write finishes and the file has the correct contents.

The time it's most noticeable is when saving files with vi. It only happens about 5% of the time (but because it's so frustrating, it feels like it's more often than it really is).

Today it happened when running rsync on several hundred files (2KiB - 5KiB each). I was able to run strace on the rsync process and see that it's happening over 50% of the time.

# sudo strace -ttt -T -p 14186
Process 14186 attached
1452694932.030892 select(9, [8], [], NULL, {55, 59875}) = 1 (in [8], left {44, 963900}) <10.096109>
1452694942.127262 read(8, "\4\0\0k\10\3\0\0", 8184) = 8 <0.000038>
1452694942.127378 select(9, [8], [], NULL, {60, 0}) = 1 (in [8], left {59, 997018}) <0.003029>
1452694942.130529 read(8, "\4\0\0k\t\3\0\0", 8184) = 8 <0.000040>
1452694942.130715 select(9, [8], [], NULL, {60, 0}) = 1 (in [8], left {44, 963694}) <15.036348>
1452694957.167236 read(8, "\4\0\0k\n\3\0\0", 8184) = 8 <0.000071>
1452694957.167419 select(9, [8], [], NULL, {60, 0}) = 1 (in [8], left {59, 996528}) <0.003572>
1452694957.171122 read(8, "\4\0\0k\v\3\0\0", 8184) = 8 <0.000112>
1452694957.171340 select(9, [8], [], NULL, {60, 0}) = 1 (in [8], left {44, 964372}) <15.035715>
1452694972.207210 read(8, "\4\0\0k\f\3\0\0", 8184) = 8 <0.000026>
1452694972.207303 select(9, [8], [], NULL, {60, 0}) = 1 (in [8], left {44, 960236}) <15.039908>
1452694987.247375 read(8, "\4\0\0k\r\3\0\0", 8184) = 8 <0.000111>
1452694987.247616 select(9, [8], [], NULL, {60, 0}) = 1 (in [8], left {44, 960455}) <15.039628>
1452695002.287486 read(8, "\4\0\0k\16\3\0\0", 8184) = 8 <0.000100>
1452695002.287665 select(9, [8], [], NULL, {60, 0}) = 1 (in [8], left {59, 996177}) <0.004000>
1452695002.291819 read(8, "\4\0\0k\17\3\0\0", 8184) = 8 <0.000089>
1452695002.292014 select(9, [8], [], NULL, {60, 0}) = 1 (in [8], left {44, 964982}) <15.035132>
1452695017.327303 read(8, "\4\0\0k\20\3\0\0", 8184) = 8 <0.000082>
1452695017.327491 select(9, [8], [], NULL, {60, 0}) = 1 (in [8], left {59, 995793}) <0.004300>
1452695017.331931 read(8, "\4\0\0k\21\3\0\0", 8184) = 8 <0.000052>

I haven't seen anyone else asking about this problem. Am I alone?

Edit:

Mount options:

nfs:/storage on /space type nfs4 (rw,relatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=0,timeo=14,retrans=2,sec=sys,clientaddr=10.12.32.4,local_lock=none,addr=10.12.32.31)

Contents of /etc/exports:

/storage 10.0.0.0/8(rw,async,no_root_squash,no_subtree_check,mp=/storage)

Can you post what you have in the /etc/exports so I can see the options that are set. — Gmck, Jan 13 '16 at 14:58
Is there anything NFS or RPC related in either `/var/log/messages` or the output from `dmesg` on the server or any of the clients? — Andrew Henle, Jan 13 '16 at 17:33

score 0 · Accepted Answer · answered Jan 21 '16 at 14:04

Running strace on the other rsync child process, showed that it was hanging during a rename() call. That led me to find that it's this bug:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=799838

I've worked around it by setting the noatime mount option.

NFS clients sometimes hang for 15 seconds

1 Answers1