In a multithreaded program, how can one effectively profile threads that are waiting on a lock, are sleeping or are scheduled out in some other way?
For my profiling purposes I need to have insight in some lock contention. So I would like to see this in, for example, a stack trace profiler tool from which one can generate flame graphs. I first tried to do this using gperftools
CPU profiler. But as the name suggests, that only profiles threads that are actually doing something on a CPU and you will not see the stack traces of the threads waiting on a lock.
So then I switched to perf which I was hoping to be powerful enough to somehow be able to gather profile info on the scheduled out threads as well. But so far without luck.
Here is my test program:
#include <pthread.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>
void *threadfunction(void *arg)
{
pthread_mutex_t* mutex = arg;
pthread_mutex_lock(mutex);
printf("I am the thread function!\n");
pthread_mutex_unlock(mutex);
}
int main(void)
{
pthread_mutex_t mutex;
pthread_mutex_init(&mutex, 0);
pthread_mutex_lock(&mutex);
pthread_t thread;
(void) pthread_create(&thread, NULL, threadfunction, &mutex);
for (int i = 0; i < 60000000; i++)
printf("I am the main function!\n");
pthread_mutex_unlock(&mutex);
(void) pthread_join(thread, NULL);
return 0;
}
Now I run the following perf-record
perf record -F50 --call-graph dwarf test_program
I compile this into a flamegraph with
perf script > perf.script
stackcollapse-perf.pl perf.script | flamegraph.pl > flamegraph.svg
the resulting flamegraph is below. And you can see that it basically only shows the stacks belonging to the main function. The threadfunction
is run in a different thread and because its waiting for a lock and therefore scheduled out, you don't see any stack traces related to it.
I have tried adding certain events to perf-record
-e sched:sched_stat_sleep,sched:sched_switch
But that did not help either.
How could I effectively create and compile a lock contention based stack trace profile combined with a CPU based stack trace profile using perf
? I like to use perf but I am very much open for other tool suggestions as well.
For instance it is known to me that with the gdb
based poor man's profiler you can actually get the stack traces of the sleeping threads. This is reasonable because gdb
somehow must be able to get stack information from any thread. But I would prefer a more attuned and dedicated tool for such a task then gdb
.