1

The code that I am working on has a lot of calls to create a new strings and stuff.. But recently after upgrading the servers to 12.10 Ubuntu, I have started facing some troubles. Some of the child processes get stuck in futex. So I went and attached GDB to the running process that is in futex for a long time, i did a backtrace and found the following logs

#0  0x00007f563afc69bb in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f563af4a221 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f563af47fa7 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f563afcfbfa in backtrace_symbols () from /lib/x86_64-linux-gnu/libc.so.6
#4  0x0000000000446945 in sig_segv (signo=<optimized out>) at FILE THAT HAS THE HANDLER,SIGHANDLER
#5  <signal handler called>
#6  0x00007f563aefb425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007f563aefeb8b in abort () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x00007f563af3939e in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#9  0x00007f563af43b96 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x00007f563af463e8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#11 0x00007f563af47fb5 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
#12 0x00007f563b7f660d in operator new(unsigned long) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x00007f563b8533b9 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007f563b854d95 in char* std::string::_S_construct<char const*>(char const*, char const*, std::allocator<char> const&, std::forward_iterator_tag) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#15 0x00007f563b854e73 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#16 0x0000000000412362 in MyString (bs=0x4aabd6 "-", this=0x7fffe854f940) at CONSTRUCTOR FROM C-STRING MyString(const char* bs):std::string(bs) {};
#17 A FUNCTION THAT CALLS THE ABOVE LINE

I am confused. I checked the memory, and the PC had nearly 20GB free RAM memory. So what are the chances that a function crashes in malloc? I get why it is stuck in futex, but why malloc? I would really love to get an explanation for this.

The crash happens after this like is called :

    MyString(const char* bs):std::string(bs) {};

This line is called to convert a simple c-string to a c++ type std::string. But the class is my own. I am unable to give the entire code here due to mainly 2 reasons. 1) The code is owned by my company. 2) Its damn long.

I am really sorry. I just need an explanation as to why it will crash in malloc and hence causing a deadlock because the sighandler also calls for malloc and it waits for the previous lock to release, which will not.

Prasanth Madhavan
  • 12,657
  • 15
  • 62
  • 94

2 Answers2

1

It looks like you might be calling malloc() (indirectly, through backtrace_symbols()) in a signal handler, Don't.

malloc() is not async-signal safe. Calling it inside a signal handler while other code is in malloc() will likely deadlock you (as it did here).

Use backtrace_symbols_fd() instead, it won't call malloc()

Hasturkun
  • 35,395
  • 6
  • 71
  • 104
  • thanks, this will fix the deadlock.. but how does the crash happen? – Prasanth Madhavan Mar 18 '13 at 13:07
  • @PrasanthMadhavan: The crash happens because of an error in your code. (What more can we say given the amount of context which you've posted?) – CB Bailey Mar 18 '13 at 13:10
  • At a guess, you probably corrupted the heap somewhere. You can use valgrind to find the problem. You could also try setting the environment variable `MALLOC_CHECK_` to `3`, which may make it `abort()` earlier, possibly closer to the point of corruption. – Hasturkun Mar 18 '13 at 13:11
  • Also, on second reading, it's possible that you're using the same handler for `SIGABRT`. AFAICT, the malloc implementation was probably already trying to report the corruption. – Hasturkun Mar 18 '13 at 13:14
  • @CharlesBailey: I am no saying there is no error in my code, but that this code has handled millions of requests, without crashing at this point. It has crashed, but not here. nothing new hasbeen added except for the OS being upgraded. – Prasanth Madhavan Mar 18 '13 at 13:15
  • @Hasturkun I will replace the backtrace_symbols for now. That will prevent the deadlock. The abort was called from inside malloc. And the handler tries to call malloc. Its messed up. – Prasanth Madhavan Mar 18 '13 at 13:17
  • #0 0x7552b064 in raise () from /lib/libc.so.1 #1 0x75525240 in abort () from /lib/libc.so.1 #2 0x75523e08 in free () from /lib/libc.so.1 Backtrace stopped: previous frame identical to this frame (corrupt stack?) please help me to solve this problem? – lucifer Apr 22 '16 at 02:52
  • @lucifer: I'm sorry, but that isn't enough information to tell you much of anything. You probably have heap corruption (or a double `free()`). Try setting `MALLOC_CHECK_` as described above. If you're still having trouble, you should ask a new question with a [mcve]. – Hasturkun Apr 22 '16 at 03:15
1

The memory pointed by the string might be corrupted / freed etc ..

This problem might have been there before and got manifested now because of change in compiler / other libraries.

Run your code with valgrind, to debug memory corruption issues.

vikesh
  • 31
  • 4