As part of profiling couple of golang services we are seeing that all the services are spending 55% to 70% time in the runtime.futex function.
Note that these services involve use of goroutines, locks, and channels for synchronization and communication between the goroutines.
Is this expected and we should only care about the time spent in the non-"runtime" functions or could this be an artifact of incorrect usage of goroutines/locks/channels that is causing this overhead in the futex call?
(pprof) top20
Showing nodes accounting for 43.80s, 80.25% of 54.58s total
Dropped 654 nodes (cum <= 0.27s)
Showing top 20 nodes out of 156
flat flat% sum% cum cum%
31.05s 56.89% 56.89% 31.05s 56.89% runtime.futex
3.55s 6.50% 63.39% 4.51s 8.26% runtime.cgocall
2.25s 4.12% 67.52% 2.25s 4.12% runtime.usleep
1.14s 2.09% 69.60% 1.53s 2.80% runtime.scanobject
0.73s 1.34% 70.94% 1.56s 2.86% syscall.Syscall
0.71s 1.30% 72.24% 2.27s 4.16% runtime.mallocgc
(pprof) top20
Showing nodes accounting for 12.49s, 95.93% of 13.02s total
Dropped 93 nodes (cum <= 0.07s)
Showing top 20 nodes out of 47
flat flat% sum% cum cum%
9.69s 74.42% 74.42% 9.69s 74.42% runtime.futex
1.25s 9.60% 84.02% 1.63s 12.52% runtime.cgocall
0.64s 4.92% 88.94% 0.64s 4.92% runtime.usleep
0.17s 1.31% 90.25% 0.22s 1.69% runtime.timeSleepUntil
0.11s 0.84% 91.09% 0.17s 1.31% runtime.scanobject
0.08s 0.61% 91.71% 0.08s 0.61% runtime.nanotime1
0.08s 0.61% 92.32% 0.26s 2.00% runtime.retake
0.07s 0.54% 92.86% 0.07s 0.54% runtime.lock2
0.07s 0.54% 93.39% 0.13s 1% runtime.traceEventLocked
0.06s 0.46% 93.86% 0.36s 2.76% runtime.notetsleep_internal
0.05s 0.38% 94.24% 0.13s 1% runtime.mallocgc