90

Firstly, I use pthread library to write multithreading C programs. Threads always hung by their waited mutexes. When I use the strace utility to find a thread in the FUTEX_WAIT status, I want to know which thread holds that mutex at that time. But I don't know how I could I do it. Are there any utilities that could do that?

Someone told me the Java virtual machine supports this, so I want to know whether Linux support this feature.

Alexis Wilke
  • 19,179
  • 10
  • 84
  • 156
terry
  • 1,437
  • 4
  • 14
  • 11

4 Answers4

148

You can use knowledge of the mutex internals to do this. Ordinarily this wouldn't be a very good idea, but it's fine for debugging.

Under Linux with the NPTL implementation of pthreads (which is any modern glibc), you can examine the __data.__owner member of the pthread_mutex_t structure to find out the thread that currently has it locked. This is how to do it after attaching to the process with gdb:

(gdb) thread 2
[Switching to thread 2 (Thread 0xb6d94b90 (LWP 22026))]#0  0xb771f424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb771f424 in __kernel_vsyscall ()
#1  0xb76fec99 in __lll_lock_wait () from /lib/i686/cmov/libpthread.so.0
#2  0xb76fa0c4 in _L_lock_89 () from /lib/i686/cmov/libpthread.so.0
#3  0xb76f99f2 in pthread_mutex_lock () from /lib/i686/cmov/libpthread.so.0
#4  0x080484a6 in thread (x=0x0) at mutex_owner.c:8
#5  0xb76f84c0 in start_thread () from /lib/i686/cmov/libpthread.so.0
#6  0xb767784e in clone () from /lib/i686/cmov/libc.so.6
(gdb) up 4
#4  0x080484a6 in thread (x=0x0) at mutex_owner.c:8
8               pthread_mutex_lock(&mutex);
(gdb) print mutex.__data.__owner
$1 = 22025
(gdb)

(I switch to the hung thread; do a backtrace to find the pthread_mutex_lock() it's stuck on; change stack frames to find out the name of the mutex that it's trying to lock; then print the owner of that mutex). This tells me that the thread with LWP ID 22025 is the culprit.

You can then use thread find 22025 to find out the gdb thread number for that thread and switch to it.

caf
  • 233,326
  • 40
  • 323
  • 462
  • 1
    Is there a way to correlated __data__.__owner__ with pthread thread id? In playing with this I simply coded log << mutex.__data__.owner << endl and that appears to work fine. But the data.owner is a value like 9841 while the tid is like 140505876686608. What is the relationship between the two values? – Duck Aug 19 '10 at 18:20
  • 6
    @Duck: The value in `.__data.__owner` is a TID. When each thread starts you could just have them log their TID (using `tid = syscall(SYS_gettid);`) as well as their `pthread_t` (from `pthread_self()`). – caf Aug 20 '10 at 00:09
  • 1
    You could also examine the thread's stack pointer in the `stat` file in `proc`, and it will be pretty close (within a few kb) of the `pthread_t` value. :-) – R.. GitHub STOP HELPING ICE Apr 29 '11 at 03:17
  • 3
    BTW: One could use `info threads` to map TIDs (`.__data.__owner`) to pthread IDs (the IDs that one operates on in gdb). – Adam Romanek May 29 '14 at 07:50
  • 11
    @caf, you can add to your answer that nowdays in gdb there is `thread find` command. So after finding that `mutex.__data.__owner` is 22025 you can run: `thread find 22025` and get the number of the thread in gdb: (example: `Thread 29 has target id 'Thread 0x7fffdf5fe700 (LWP 22025)' `). So you can next switch to the thread that holds the lock with the command: `thread 29` or just `t 29` –  Dec 18 '15 at 10:36
  • According to [this answer](https://stackoverflow.com/a/6697556/746346), the `__owner` field is sometimes not filled in, so it is 0 (invalid). – Tor Klingberg Feb 22 '18 at 17:20
5

I don't know of any such facility so I don't think you will get off that easily - and it probably wouldn't be as informative as you think in helping to debug your program. As low tech as it might seem, logging is your friend in debugging these things. Start collecting your own little logging functions. They don't have to be fancy, they just have to get the job done while debugging.

Sorry for the C++ but something like:

void logit(const bool aquired, const char* lockname, const int linenum)
{
    pthread_mutex_lock(&log_mutex);

    if (! aquired)
        logfile << pthread_self() << " tries lock " << lockname << " at " << linenum << endl;
    else
        logfile << pthread_self() << " has lock "   << lockname << " at " << linenum << endl;

    pthread_mutex_unlock(&log_mutex);
}


void someTask()
{
    logit(false, "some_mutex", __LINE__);

    pthread_mutex_lock(&some_mutex);

    logit(true, "some_mutex", __LINE__);

    // do stuff ...

    pthread_mutex_unlock(&some_mutex);
}

Logging isn't a perfect solution but nothing is. It usually gets you what you need to know.

Duck
  • 26,924
  • 5
  • 64
  • 92
  • Logging indeed is quite useful tool for debugging. Thanks for your suggestions. – terry Aug 14 '10 at 16:12
  • 1
    +1 Who doesn't love logging? It could be done with no code changes using LD_PRELOAD (and some patience). Wrap `pthread_mutex_*` functions with something that logged the function calls, the mutex' address, and a thread identifier (`pthread_t` happens to be an integral type on Linux, not a portable assumption but quite a convenience). – pilcrow Aug 15 '10 at 14:25
  • 9
    possible problem with logging is that it could disrupt the timing and make the issue vanish. – Spudd86 Nov 25 '10 at 18:39
  • Also you can't always/predictably interpose library functions. It's not a guarantee. – Matt Joiner Jul 03 '12 at 21:06
  • Logging is quite useful. However, there are some places where logging isn't safe. Specifically, ```malloc``` isn't safe in certain places - for example, in signal handlers, atfork handlers, between fork and exec in a multi-threaded program, etc. See [async-signal-safety](http://man7.org/linux/man-pages/man7/signal-safety.7.html) and the other man pages. – mgarey Apr 11 '18 at 15:46
3

Please read below link, This has a generic solution for finding the lock owner. It works even if lock in side a library and you don't have the source code.

https://en.wikibooks.org/wiki/Linux_Applications_Debugging_Techniques/Deadlocks

2

Normally libc/platforms calls are abstracted by OS abstraction layer. The mutex dead locks can be tracked using a owner variable and pthread_mutex_timedlock. Whenever the thread locks it should update the variable with own tid(gettid() and can also have another variable for pthread id storage) . So when the other threads blocks and timed out on pthread_mutex_timedlock it can print the value of owner tid and pthread_id. this way you can easily find out the owner thread. please find the code snippet below, note that all the error conditions are not handled

pid_t ownerTid;
pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;

class TimedMutex {
    public:
        TimedMutex()
        {
           struct timespec abs_time;

           while(1)
           {
               clock_gettime(CLOCK_MONOTONIC, &abs_time);
               abs_time.tv_sec += 10;
               if(pthread_mutex_timedlock(&mutex,&abs_time) == ETIMEDOUT)
               {
                   log("Lock held by thread=%d for more than 10 secs",ownerTid);
                   continue;
               }
               ownerTid = gettid();
           }
        }

        ~TimedMutex()
        {

             pthread_mutex_unlock(&mutex);  
        }
};

There are other ways to find out dead locks, maybe this link might help http://yusufonlinux.blogspot.in/2010/11/debugging-core-using-gdb.html.

hlovdal
  • 26,565
  • 10
  • 94
  • 165
Yusuf Khan
  • 409
  • 3
  • 9