Mutex deadlock in apache due to SIGALRM

Question

We are using Apache 1.3.12 in our project having platform as Solaris 10 and found that some apache processes are in stale (deadlock) state. This is single threaded Apache we are running.

gcore analysis says that there is a call to realloc from the function to re-allocate some memory, but then timeout occurs which generates a SIGALRM signal. After generating SIGALRM, some cleanup happens for resetting the request and attempts to allocate some memory. But realloc must have not released the lock as it has not completed its job, and so there is a deadlock while the thread waits for the lock to be released. DBX indicates that the mutex lock is held by the same thread

There are sufficient memory resources and disk space available on the system.

Stack Trace:

[1] __lwp_park(0x0, 0x0, 0x0, 0x0, 0xfdb34000, 0x1), at 0xff245898
[2] mutex_lock_queue(0x0, 0x0, 0xff271838, 0x0, 0x0, 0x1), at 0xff23dcd0
[3] malloc(0x48, 0x1, 0x98904, 0xfeb25b28, 0xff26e32c, 0xff2776f0), at 0xff1d5a64
[4] operator new(0x48, 0xffbf7910, 0x1aa0cf4, 0x137d4, 0xfeefa810, 0x1010101), at 0xfeee7064 
=>[5] TApacheWebRequest::ExtractOriginalURL(this = 0x6e6f38), line 415 in "src/mi
padapters/apache/ApacheWebRequest.cpp"
[6] TApacheWebRequest::ResetRequest(this = 0x6e6f38, req = 0x712f80, internallyRedirected = false), line 162 in "src/mipadapters/apache/ApacheWebRequest.cpp"
[7] MIPPerformProxyResponseFilter(req = 0x712f80, cache = (nil), isHttp1 = 0, pContDelivered = 0xffbf7ae0), line 836 in "src/mipadapters/apache/mippxycapi.cpp"
[8] mip_proxy_adapter_cleanup(0x712f80, 0xdc000, 0x72fe88, 0x0, 0x990, 0xbf9d0), at 0xfee6a2cc
[9] ap_proxy_timeout_handler(0x712f80, 0x32ef08, 0x1b61c, 0x6b6470ff, 0x80808080, 0x1010101), at 0xfee6a584
[10] ap_hook_call_func(0xffbf7d58, 0x349200, 0x34b858, 0x0, 0x0, 0x0), at 0xfc5d8
[11] ap_hook_call(0x32eef4, 0x712f80, 0x0, 0x0, 0x0, 0x0), at 0xfa858
[12] timeout(0xe, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xd1ce8
[13] alrm_handler(0xe, 0x0, 0xffbf7f40, 0x1, 0x724fec, 0xfeb45c10), at 0xd214c
[14] __sighndlr(0xe, 0x0, 0xffbf7f40, 0xd2120, 0x0, 0x1), at 0xff245924
---- called from signal handler with signal 14 (SIGALRM) ------
[15] realloc(0x72fe80, 0xdc000, 0x72fe88, 0x0, 0x990, 0xbf9d0), at 0xff1d5e58
[16] ProxyResponse(0x712f80, 0x72bec8, 0x3743b8, 0x72c230, 0x701b90, 
0x0), at 0xfee7c2bc
[17] ap_proxy_http_handler(0x0, 0xffbfc430, 0xffbfc474, 0x0, 0x0, 0x72bfa0), 
at 0xfee80590
[18] mip_proxy_handler(0x72bec8, 0x714d3e, 0x0, 0x0, 0x0, 0x0), at 0xfee6c414
[19] proxy_handler(0x712f80, 0xfee8a3c0, 0x0, 0x0, 0x72766572, 0x72766572), 
at 0xfee6af5c
[20] ap_invoke_handler(0x712f80, 0x71, 0x0, 0x7149bf, 0x80808080, 0x1010101), 
at 0xc2e2c
[21] process_request_internal(0x712f80, 0x1, 0x40, 0xe7084, 0xff322da4, 
0x3d147af4), at 0xe4924
[22] ap_process_request(0x712f80, 0x4, 0x712f80, 0xffbfe8b0, 0xffbfe8c0, 
0x94), at 0xe49d4
[23] child_main(0x94, 0xd4048, 0xd35b4, 0x0, 0xff3a2000, 0xff273580), at 
0xd670c
[24] make_child(0x340880, 0x94, 0x51ac7fb6, 0x0, 0xffbfea44, 0x0), at 0xd6b0c
[25] perform_idle_server_maintenance(0xffffffff, 0x0, 0x0, 0x340880, 
0x2ea900, 0x2b7400), at 0xd705c
[26] standalone_main(0x4, 0xffbfebcc, 0x31a594, 0xffffffff, 0xa4c48, 0xc), at 
0xd7834
[27] main(0x4, 0xffbfebcc, 0xffbfebe0, 0x318800, 0xff3a0100, 0xff3a0140), at 
0xd82e0

Stack Trace Explanation: “ProxyResponse” function is customized which handled chunked and non-chunked transfer encoding.

In this function, Receiving SIGAlRM while re-allocating the memory as bellow:

/* Increase buffer size if there is not enough space to read */
if ((bufsize - total_bytes_recvd) < n)
{
buf = (unsigned char *)realloc(buf, (bufsize + allocLen));
                bufsize += allocLen;
}

After getting SIGALRM, apache clean-up happens to reset the request.

Appreciate any help/pointers on this issue.

Thanks in advance.

Deepak

apache 1.3.12? wasn't that like 2000 or so? anyways, never allocate memory in a signal handler. POSIX has a list of signal safe functions, you should only ever use them and never anything else. — PlasmaHH, Jul 17 '13 at 08:13
Take a look at this previous question (http://stackoverflow.com/a/4925670/91042) as it points to some probable causes. As suspect that this is not a possibility, but the apache version you are using is a bit old, could you test with an up-to-date version of apache to see if the problem appears again? — Daniel H., Jul 17 '13 at 08:14
It would be difficult to use latest Apache version since we have done some customization in this apache version. — user2590374, Jul 17 '13 at 08:39

Mutex deadlock in apache due to SIGALRM

0 Answers0