sssd_nss
is the daemon that abstracts user/group information requests from downstream services such as LDAP. It doesn't actually do the lookup, but rather makes the request to the service that does it, first checking a local disk cache.
This makes me think that the heavy I/O portions are doing a lot of operations around users and groups (eg. lookup up the username for a UID, look up the groups for a UID).
You should also look into whether the high sssd_nss
CPU is IOWAIT. This would indicate that you are indeed doing a lot of user/group queries and somehow that is being held up by disk I/O. You can use top
to see the overall system IOWAIT (look for the wa
), and iotop
to get per-process metrics.
If it is primarily IOWAIT, you may need to separate add I/O capacity or separate your build volumes from your system volumes. I have my doubts that this is the root cause of your issue.
You mention this has happened after meltdown/spectre patches. This may indicate the build process is initiating a lot of system calls in sssd_nss
which are now slower with those patches. You may want to look into your build process and see if there are unnecessary user/group related commands. You can look into the system calls being called using strace -p $pid_of_sssd_nss
or use sysdig for even fancier analysis. If that service is doing a lot of system calls, look into what calls it is making and figure out where your build process is initiating those calls. Then try to minimize them.