You could be interested by valgrind and it's tool callgrind.
valgrind --trace-children=yes --tool=callgrind -v ./program
It will generates a detailled callgraph into a file, with among others, the amount of time passed in each function.
Then you can see all of that with kcachegrind, which is a nice UI to visualize the data.
kcachegrind
It will allow you to see all functions which called pthread_mutex_lock() (or others), and among them, the top ones, by percent of time, ...
The most relevant part of callgrind is that you can easily find bottleneck in single-threaded program, because you just have to look the function which took the most cpu time.
On multithreaded program, a function waiting a long time for something (a mutex) is a normal condition, so it's more difficult.
You can also use the tool Helgrind from valgrind, which help find errors in your usage of mutexes (potential deadlocks or potential data races).
I guess that it analyses your calls to synchronization functions, and the data you read/write, to detect potential problem (problem that could occur 1 time over 1000000), by analyzing the Serializability conformance of your synchronization and data access. (I repeat : I guess).
valgrind --tool=helgrind --suppressions=$PWD/supp --gen-suppressions=yes --db-attach=yes --track-lockorders=no ./program
And the core feature of valgrind: Checking memory leak:
valgrind --leak-check=yes -v --db-attach=yes ./program