High CPU usage by sssd_nss during heavy disk IO

Question

I'm on Oracle Enterprise Linux 7u2 where I perform frequent, heavy maven builds which generate a large number of jars/wars/ears. What I've noticed recently (after some of the meltdown / spectre patches) is very heavy CPU utilization by this process:

/usr/libexec/sssd/sssd_nss --uid 0 --gid 0 --debug-to-files

When my server is idle? No problems. But during the heavy disk IO portions of my maven builds, the maven java process and sssd_nss fight over CPU, each taking about 50% of the total. (For reference, I have a 4 core Xeon server)

I don't really know this process is (except that it might deal with LDAP?) or why it would care about java file copying and zipping. (This is all on local / non-NFS disk)

score 3 · Answer 1 · answered Apr 07 '18 at 06:48

sssd_nss is the daemon that abstracts user/group information requests from downstream services such as LDAP. It doesn't actually do the lookup, but rather makes the request to the service that does it, first checking a local disk cache.

This makes me think that the heavy I/O portions are doing a lot of operations around users and groups (eg. lookup up the username for a UID, look up the groups for a UID).

You should also look into whether the high sssd_nss CPU is IOWAIT. This would indicate that you are indeed doing a lot of user/group queries and somehow that is being held up by disk I/O. You can use top to see the overall system IOWAIT (look for the wa), and iotop to get per-process metrics.

If it is primarily IOWAIT, you may need to separate add I/O capacity or separate your build volumes from your system volumes. I have my doubts that this is the root cause of your issue.

You mention this has happened after meltdown/spectre patches. This may indicate the build process is initiating a lot of system calls in sssd_nss which are now slower with those patches. You may want to look into your build process and see if there are unnecessary user/group related commands. You can look into the system calls being called using strace -p $pid_of_sssd_nss or use sysdig for even fancier analysis. If that service is doing a lot of system calls, look into what calls it is making and figure out where your build process is initiating those calls. Then try to minimize them.

High CPU usage by sssd_nss during heavy disk IO

1 Answers1