0

Setup:

NFS server: NFSServerHOST
NFS Share: NFSServerHOST:/MYSHARE
NFS Client1: CLNT1
NFS Client2: CKNT2
NFS Client3: CLNT3
Client OS: RHEL 7 and 8 
Client users: User1, User2
Local Mount on Client:  /var/NFSSHARE

mount -t nfs4 NFSServerHOST:/MYSHARE /var/NFSSHARE

Mount created successfully on all clients. Both User1 and User2 can read/write on /var/NFSSHARE from all 3 Clients. Now Something happens on Client2 (we're yet to find out if it's related to server patch or some cron job) User1 cannot read/write to /var/NFSSHARE only on Client1. User2 can still read/write to NFSSHare on CLient1. Both users can still read/write on Client2 and client3.

Error while performing read/write on Client1 for User1: Remote I/O error

If we reboot client1 the issue is gone and User1 can again perform I/O operatrion on NFSSahre from Client1.

Some of the things we checked:

No version mismatch: Both NFS client and NFS Server config is for NFS V4 Nothing wrong with whitelisting: ALl 3 Client IPs are whitelisted on NFSServer

Have checked the inodes and lsof usage which is well within the limit.

nfs4_getafacl /var/NFSSHARE

# file: /var/NFSSHARE
A::EVERYONE@:rwaDxtTnNcy

running getfacl /var/NFSSHARE with User1 as logged in User on CLient1

# file: var/MQHA/
# owner: nobody
# group: nobody
user::rwx
group::rwx
other::rwx

comparing rpcdebug log while performing I/O operation on Client1 (FAILURE) vs Client2 (SUCCESS)

kernel: NFS: nfs_update_inode(0:57/3963604504 fh_crc=0xbf9e74c8 ct=2 info=0x427e7f)
kernel: NFS: (0:57/3963604504) revalidation complete
kernel: NFS: permission(0:57/3963604504), mask=0x1, res=0
kernel: NFS: permission(0:57/3963604504), mask=0x3, res=0
kernel: NFS: atomic_open(0:57/3963604504), Abhi
kernel: --> nfs_put_client({2})
kernel: --> nfs4_alloc_slot used_slots=0002 highest_used=1 max_slots=1024
kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=0
Logs chane after this point. Before both SUCCESS and FAILURE are more or less same just the numeric values are different.
Client1 (FAILURE)
    kernel: nfs4_free_slot: slotid 0 highest_used_slotid 1
    kernel: NFS: permission(0:57/3963604504), mask=0x81, res=-10 
    kernel: --> nfs4_alloc_slot used_slots=0002 highest_used=1 max_slots=1024
    kernel: <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=0
    kernel: decode_attr_type: type=00

Client2 (SUCCESS)
    kernel: decode_attr_type: type=0100000
    kernel: decode_attr_change: change attribute=7148460619683717735
    kernel: decode_attr_size: file size=0

Looking for suggestions to diagnose this issue. What more can we do to enable more verbose logging either on Client or server side to know more about the error ?

Thanks

Abhishek
  • 45
  • 1
  • 6
  • My guess would be a user mapping error on the Client1. But you need to add to your questions any logs from the clients & server, or state that there's nothing in the logs. – root Oct 30 '22 at 09:06

0 Answers0