You can check /proc/$PID/stack
on the client to see the whole stack of the process, which would give you some more information about what the process is doing (ptlrpc_set_wait()
is just the generic "wait for RPC completion" function).
That said, what is more likely to be useful is to check the kernel console error messages (dmesg
and/or /var/log/messages
) to see what is going on. Lustre is definitely not shy about logging errors when there is a problem.
Very likely this will show that the client is waiting on a server to complete the RPC, so you'll also have to check the dmesg
and/or /var/log/messages
To see what the problem is on the server. There are several existing docs that go into detail about how to debug Lustre issues:
At that point, you are probably best off to check for existing Lustre bugs at https://jira.whamcloud,com/ to search for the first error messages that are reported, or maybe a stack trace. It is very likely (depending on what error is being hit), that there is already a fix available, and upgrading to the latest maintenance release (2.12.7 currently), or applying a patch (if the bug is recently fixed) will sole your problem.