3

I am attempting to do the following - write a wrapper for the pthreads library that will log some information whenever each of its APIs it called. One piece of info I would like to record is the stack trace.

Below is the minimal snippet from the original code that can be compiled and run AS IS.

Initializations (file libmutex.c):

#include <execinfo.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <dlfcn.h>

static int (*real_mutex_lock)(pthread_mutex_t *) __attribute__((__may_alias__));
static void *pthread_libhandle;

#ifdef _BIT64
#define PTHREAD_PATH      "/lib64/libpthread.so.0"
#else
#define PTHREAD_PATH      "/lib/libpthread.so.0"
#endif 

static inline void load_real_function(char* function_name, void** real_func) {
  char* msg;
  *(void**) (real_func) = dlsym(pthread_libhandle, function_name);
  msg = dlerror();
  if (msg != NULL)
    printf("init: real_%s load error %s\n", function_name, msg);
}

void __attribute__((constructor)) my_init(void) {
   printf("init: trying to dlopen '%s'\n", PTHREAD_PATH);
   pthread_libhandle = dlopen(PTHREAD_PATH, RTLD_LAZY);
   if (pthread_libhandle == NULL) {
     fprintf(stderr, "%s\n", dlerror());
     exit(EXIT_FAILURE);
  }
  load_real_function("pthread_mutex_lock", (void**) &real_mutex_lock);
}

The wrapper and the call to backtrace. I have chopped as much as possible from the methods, so yes, I know that I never call the original pthread_mutex_lock for example.

void my_backtrace(void) {
    #define SIZE 100
    void *buffer[SIZE];
    int nptrs;

    nptrs = backtrace(buffer, SIZE);
    printf("backtrace() returned %d addresses\n", nptrs);
}

int pthread_mutex_lock(pthread_mutex_t *mutex) {
  printf("In pthread_mutex_lock\n"); fflush(stdout);
  my_backtrace();
  return 0;
}

To test this I use this binary (file tst_mutex.c):

#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>

int main (int argc, char *argv[]) {
  pthread_mutex_t x;

  printf("Before mutex\n"); fflush(stdout);
  pthread_mutex_lock(&x);
  printf("after  mutex\n");fflush(stdout);

  return 0;
}

Here is the way all this is compiled:

rm -f *.o *.so tst_mutex

cc -Wall -D_BIT64 -c -m64 -fPIC libmutex.c
cc -m64 -o libmutex.so -shared -fPIC -ldl -lpthread libmutex.o

cc -Wall -m64 tst_mutex.c  -o tst_mutex

and run

LD_PRELOAD=$(pwd)/libmutex.so ./tst_mutex

This crashes with segmentation fault on Linux x86. On Linux PPC everything works flawlessly. I have tried a few versions of GCC compilers, GLIBC libraries and Linux distros - all fail.

The output is

init: trying to dlopen '/lib64/libpthread.so.0'
Before mutex
In pthread_mutex_lock
In pthread_mutex_lock
In pthread_mutex_lock
...
...
./run.sh: line 1: 25023 Segmentation fault      LD_PRELOAD=$(pwd)/libmutex.so ./tst_mutex

suggesting that there is a recursion here. I have looked at the source code for backtrace() - there is no call in it to locking mechanism. All it does is a simple walk over the stack frame linked list. I have also, checked the library code with objdump, but that hasn't revealed anything out of the ordinary.

What is happening here? Any solution/workaround?

Oh, and maybe the most important thing. This only happens with the pthread_mutex_lock function!! Printing the stack from any other overridden pthread_* function works just fine ...

  • Have you tried with RTLD_NOW? – stark Apr 24 '13 at 16:41
  • The problem isn't opening the pthreads library. – Johnny English Apr 24 '13 at 22:00
  • I tried your steps but I get an error `./tst_mutex: symbol lookup error: libmutex.so: undefined symbol: dlopen`. I'm not going to troubleshoot that now, but maybe something was missing from the steps for reproducing the problem. – Gabriel Southern Apr 24 '13 at 22:36
  • I would guess that `backtrace` calls `pthread_mutex_lock` somewhere internally, so you end up with a loop. The segfault happens when the stack overflows. To avoid it, you need to NOT hook the entry point while you're in the hook. – Chris Dodd Apr 25 '13 at 00:23
  • @Gabriel: I have edited the issue. the first two code sections go into a file called libmutex.c The 3rd code segment (the test) go into tst_mutex.c #include provides the dlopen function. – Johnny English Apr 25 '13 at 05:54
  • @ChrisDodd: I checked the source code of `backtrace`. All it does is a loop over the stack frames. There is no call to any locking mechanism. It should not need either since it accesses the stack frames of the current thread in read-only mode. I have edited the issue – Johnny English Apr 25 '13 at 05:55
  • Can you list the glibc versions you've tried? I know there were issues in glibc 2.3 related to pthread_mutex_lock breaking backtraces. In personal experience, core files would often have broken backtraces from libpthread in glibc 2.3. Here's an example bug report I was able to find: http://cygwin.com/frysk/bugzilla/show_bug.cgi?id=9430 Not sure if those older problems are relevant here. – R Perrin Dec 09 '13 at 21:56

1 Answers1

0

It is a stack overflow, caused by an endless recursion (as remarked by @Chris Dodd). The backtrace() function runs different system calls being called from programs compiled with pthread library and without. Even if no pthread functions are called explicitly by the program.

Here is a simple program that uses the backtrace() function and does not use any pthread function.

#include <stdio.h>
#include <stdlib.h>
#include <execinfo.h>

int main(void)
{
 void* buffer[100];
 int num_ret_addr;

 num_ret_addr=backtrace(buffer, 100); 
 printf("returned number of addr %d\n", num_ret_addr);

 return 0;
}

Lets compile it without linking to the pthread and inspect the program system calls with the strace utility. No mutex related system call appears in the output.

$ gcc -o backtrace_no_thread backtrace.c
$ strace -o backtrace_no_thread.out backtrace_no_thread

No lets compile the same code linking it to the pthread library, run the strace and look at its output.

$ gcc -o backtrace_with_thread backtrace.c -lpthread
$ strace -o backtrace_with_thread.out backtrace_with_thread

This time the output contains mutex related system calls (their names may depend on the platform). Here is a fragment of the strace output file obtained on an X86 Linux machine.

futex(0x3240553f80, FUTEX_WAKE_PRIVATE, 2147483647) = 0
futex(0x324480d350, FUTEX_WAKE_PRIVATE, 2147483647) = 0
MichaelGoren
  • 961
  • 9
  • 15