I've got a performance problem with a large application written in C++. The program uses only 150% CPU, while the server is a 24-core hyperthreaded EPYC and other, similar applications can reliably hit the expected 4800% CPU load. iotop
shows virtually no I/O, which is expected.
As the program is apparently neither I/O-bound nor CPU-bound, I checked strace
and found that the vast majority of traced calls are waits on a single futex
. That is to say: 48 of the 50 threads in the program appear to lock the same futex, which explains quite well why the CPU load only barely exceeds 100%.
Example:
[pid 11581] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 11580] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 11579] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 11578] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 11577] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
[pid 11576] futex(0x55acec47a900, FUTEX_WAIT_PRIVATE, 2, NULL <unfinished ...>
Now the problem for me is: how do I find the offending code? The program is not deadlocks, just slow, so the usual techniques to find deadlocks do not work.