4

my epoll_wait() is consuming too much CPU, a simple strace shows that:

    strace -c -f -p 3655
    Process 3655 attached with 5 threads
    ^CProcess 3655 detached
    Process 3656 detached
    Process 3657 detached
    Process 3658 detached
    Process 3659 detached
    % time     seconds  usecs/call     calls    errors syscall
    ------ ----------- ----------- --------- --------- ----------------
     64.43   32.228205      596819        54           epoll_wait
     35.31   17.661939          17   1063312    195547 futex
      0.26    0.131803           0    448140           gettimeofday
      0.00    0.000090           3        27           write
      0.00    0.000050           1        54           epoll_ctl
      0.00    0.000000           0        54        27 read
      0.00    0.000000           0         9           sendmsg
      0.00    0.000000           0        90        54 recvmsg
    ------ ----------- ----------- --------- --------- ----------------
    100.00   50.022087               1511740    195628 total

there are just 54 epoll_wait() calls, but usecs/call of epoll_wait() is 596819, why?

Vince.Wu
  • 870
  • 1
  • 10
  • 17

1 Answers1

2

In this case the time reported by strace is a bit misleading.

strace(1) tells us:

-c Count time, calls, and errors for each system call and report a summary on program exit. On Linux, this attempts to show system time (CPU time spent running in the kernel) independent of wall clock time.

But I suspect what it's actually counting is just the time from when epoll_wait was called until it returned. This doesn't mean your process was "running" in the kernel. It's far more likely it was sleeping, which is natural for epoll_wait.

cnicutar
  • 178,505
  • 25
  • 365
  • 392
  • I'm not very sure how strace count the CPU time, but the process actually did use 130% of the CPU in a 2-core machine. – Vince.Wu Dec 04 '17 at 10:26
  • @Wenlin.Wu This is interesting. Do you have a minimal program to try it out? `epoll_wait` should just put the process on a wait queue; spending a significant amount of CPU time sounds like a bug. – cnicutar Dec 04 '17 at 10:54
  • indeed, it's my Bind DNS service, and it's a known bug of Bind, see this https://bugzilla.redhat.com/show_bug.cgi?id=709205 , I just don't know how this happens. – Vince.Wu Dec 04 '17 at 11:49
  • @Wenlin.Wu Fascinating bug report. I don't think the conclusion there is that `epoll_wait` is at fault; much more likely is that the following happens: 1. `named` gets into trouble and starts using a lot of CPU 2. You (naturally) `strace` it to find out what it's doing 3. `strace` misleads you with `epoll_wait` times - classic red herring. I suggest trying something else to measure CPU utilization such as http://www.brendangregg.com/perf.html. – cnicutar Dec 04 '17 at 14:20