3

We are observing a massive uptick in metadata requests to an NFSv4.1 (AWS EFS) network drive that is linked to a one or more web servers. This started happening about a week ago across a number of stacks.

I've done a bunch of diagnostics;

nfsiostat shows 60-450 ops/s on the servers where it is happening.

nfstrace --mode=live --verbose=2 shows that the same operations are happening repeatedly on 2-3 filehandles;

   CALL  [ operations: 4 tag: void minor version: 1
       [ SEQUENCE(53) [ sessionid: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 cache this: 0 ]
       [ PUTFH(22) [ object: 839c9918169aed49bd6f96adab5438bc47b1460162fe4a739cd8c98405868b2b5de303ed1f4ecfa82c26f5ca1db4cce8e63aad38fe72a397bdae51d8a7e93eb118ad8c736b30323165273b9c805db85936cf3be626b6ba4165ecf9755a54fdd174a535e217ffee4fa0166feea6e86becf9fa16280fd877b3545e07fe03aede08 ]
       [ ACCESS(3) [ READ LOOKUP MODIFY EXTEND DELETE  ]
       [ GETATTR(9) [ mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ]  ]
   REPLY [  operations: 4 status: OK tag: void
       [ SEQUENCE(53) [ status: OK session: 0xeab1201a93b669600000000000000001 sequenceid: 0xfc12c slotid: 2 highest slotid: 63 target highest slotid: 63 status flags: 0 ]
       [ PUTFH(22) [ status: OK ]
       [ ACCESS(3) [ status: OK supported: READ LOOKUP MODIFY EXTEND DELETE  access: READ LOOKUP MODIFY EXTEND DELETE  ]
       [ GETATTR(9) [ status: OK mask: 0x18300000 CHANGE SIZE TIME_METADATA TIME_MODIFY ]  ]

lsof -N shows nothing using files from the NFS drive.

iotop doesn't reveal anything obvious

tcpdump -s 0 -w /tmp/nfs.pcap port 2049 and then loading this into wireshark to decode the filehandle got me nowhere as the decoding appears to produce nonsensical values (inodes that don't exist etc).

While I have been able to mitigate the issue by changing the throughput mode to elastic in AWS, this issue causes performance issues and has caused production outages.

NB: This issue looks very similar to this one: nginx webserver flooding NFS with metadata requests

UPDATE 1

I found that I was able to scope down which files were at fault by looking at the reply packets, opening the tcp dump with tshark (tshark -r /tmp/nfs.pcap -V) was sufficient to get file modified timestamps;

        Opcode: GETATTR (9)
            Status: NFS4_OK (0)
            Attr mask[0]: 0x00000018 (Change, Size)
                reqd_attr: Change (3)
                    changeid: 1626
                reqd_attr: Size (4)
                    size: 38912
            Attr mask[1]: 0x00300000 (Time_Metadata, Time_Modify)
                reco_attr: Time_Metadata (52)
                    seconds: 1678928064
                    nseconds: 8000000
                reco_attr: Time_Modify (53)
                    seconds: 1678928064
                    nseconds: 8000000

I could then use find to find files that were modified then e.g. find /path/to/mount/point/ -newermt "15 Mar 2023" ! -newermt "17 Mar 2023" -ls | grep 13:54. What this revealed is that again it was seemingly stuck on certain files. While this didn't really help me track down the source of the issue it did make it more clear that all it was doing was repeatedly attempting to update metadata on a 2-5 seemingly random files that are loading on the home page of the website in question.

After some more reading I found that there is an option called lazytime available as of Linux Kernel 4.0. This option keeps more info in RAM and reduces metadata writes I did a bit of an experiment with this and I applied this mount option to the /etc/fstab file on one server in in the cluster and then did a ‘placebo’ with the other. Observations are as below;

  • Server 1:
    • checked mount options - using relatime
    • checked nfs stats - 460.982 ops/s
    • updated mount options
    • unmounted and mounted the NFS drive
    • checked nfs stats - 1.400 ops/s
    • rebooted
    • checked nfs stats - 1.316 ops/s
  • Server 2:
    • checked mount options - using relatime
    • checked nfs stats - 390.998 ops/s
    • did not updated mount options
    • unmounted and mounted the NFS drive
    • checked nfs stats - 1.750 ops/s
    • rebooted
    • checked nfs stats - 531.932 ops/s

Conclusion from this is that;

  • Unmounting and remounting the disk stops the issue temporarily
  • lazytime option seems to partially mitigate the issue
  • Issue appears to be boot related, this tracks with other observations

UPDATE 2

I found a much more effective way to troubleshoot here;

# enable debugging
rpcdebug -m nfs -s all

# watch debug output
tail -f /var/log/syslog

# disable debugging
rpcdebug -m nfs -c all

This actually produces some useful output;

Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213426] NFS: nfs_update_inode(0:55/1440391441734181492 fh_crc=0xc29dd48a ct=2 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213428] NFS: permission(0:55/1440391441734181492), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213430] NFS: nfs_lookup_revalidate_done(/Components) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213432] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213433] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213436] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238661 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213842] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213843] decode_attr_change: change attribute=2153
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213844] decode_attr_size: file size=71680
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213845] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213846] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213847] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213848] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213849] decode_attr_time_metadata: ctime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213850] decode_attr_time_modify: mtime=1677729393
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213851] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213852] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213853] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213854] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213855] nfs41_sequence_process: Error 0 free the slot
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213856] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213860] NFS: nfs_update_inode(0:55/10743289300410127521 fh_crc=0xb5c7d895 ct=1 info=0x26040)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213862] NFS: permission(0:55/10743289300410127521), mask=0x1, res=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213864] NFS: nfs_lookup_revalidate_done(Components/Image.png) is valid
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.213867] NFS: dentry_delete(Components/Image.png, 48084c)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214000] NFS: permission(0:55/1440391441734181492), mask=0x81, res=-10
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214003] --> nfs41_call_sync_prepare data->seq_server 000000008a198bb4
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214004] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214005] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214009] encode_sequence: sessionid=1354075206:678036945:0:16777216 seqid=238662 slotid=0 max_slotid=0 cache_this=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214472] decode_attr_type: type=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_change: change attribute=1626
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214473] decode_attr_size: file size=38912
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214474] decode_attr_fsid: fsid=(0x0/0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214475] decode_attr_fs_locations: fs_locations done, error = 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_mode: file mode=00
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214476] decode_attr_nlink: nlink=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_rdev: rdev=(0x0:0x0)
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214477] decode_attr_space_used: space used=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214478] decode_attr_time_access: atime=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_metadata: ctime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214479] decode_attr_time_modify: mtime=1678928064
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_attr_mounted_on_fileid: fileid=0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214480] decode_getfattr_attrs: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214481] decode_getfattr_generic: xdr returned 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=64
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214483] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214484] nfs4_free_slot: slotid 1 highest_used_slotid 0
Mar 17 10:20:41 ip-10-1-2-84 kernel: [ 3897.214485] nfs41_sequence_process: Error 0 free the slot
Az Ilari
  • 71
  • 4
  • If possible, mount your physical volume with `noatime` mount options, or `relatime`. Someone/Something is probably grepping large parts of the filesystem (in example: `grep --recursive PATTERNS /mnt/nfs/*`). _"This started happening about a week ago across a number of stacks."_, check all application changes in that time perioid. Especially monitor applications like Elasticsearch and etc.. It may also be possible that you have an intruder/hacker in your network. – paladin Mar 16 '23 at 08:21
  • Heads up: watch your costs closely. Elastic throughout mode is *very* expensive. – Michael - sqlbot Mar 16 '23 at 13:57
  • @paladin I can't find any process associated with the file system, if someone were grepping it, it would show up under `lsof -N`. There have been application changes, but the behavior only starts after a reboot. I noticed that `cachefilesd` and `amazonefs-utils` were updated in the last 2 weeks, but at the same time there are other stacks that use the same architecture that don't do this. It is mounted with `relatime`; `Flags: rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,hard,noresvport,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=1.2.3.4,fsc,local_lock=none,addr=4.3.2.1` – Az Ilari Mar 16 '23 at 19:29

1 Answers1

4

Looks like this is caused by a kernel bug in linux-aws-5.4.0-1097 I rolled back to linux-aws-5.4.0-1096 and the issue disappeared then rolled forward to linux-aws-5.4.0-1097 again and once an asset was requested by the web server it started redlining @ > 200 ops/s.

Based on the notes here: https://www.ubuntuupdates.org/package/core/bionic/main/proposed/linux-aws-5.4 it looks like the NFS client was modified in 1097: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2003053

Az Ilari
  • 71
  • 4