0

I am currently setting up a web service powered by apache and running on CENTOS 6.4. This service uses perl scripts (cgi-bin) launching in particular external homemade fortran compiled binaries.

Here is the issue: when I boot my server, everything goes well except that one of my binary crashes systematically (with a kernel segfault) when called by my perl scripts.

If I restart manually the httpd service (at the command line: service httpd restart), the issue is totally fixed. I examined apache/system logs and nothing suspicious can be found.

It appears that the problem occurs only when httpd is launched by /etc/rc[0-6].d startup directives. I tried to change the launch order of http (S85httpd by default) to any other position without success.

To summarize, my web service is only functional (with no external binary crash) when httpd is launched at the command line once the server has fully booted up!

[EDIT] This issue is now resolved:

My fortran binary handles very large arrays and complex functions requiring an unlimited stack size.

Despite that the stack size limit was defined on a system-wide basis (in /etc/security/limits.conf), for any reason it appears that the "apache/perl/fortran binary" ensemble was not aware of that (causing my binary to crash each time it was called). At the contrary, when I manually restarted apache at the shell prompt, the stacksize limit was correctly passed (.bashrc with 'ulimit -S -s unlimited').

As a workaround, I used BSD::Resource module (http://metacpan.org/pod/BSD::Resource) to define stacksize directly in my perl script by using e.g. setrlimit(RLIMIT_STACK, $softlimit, $hardlimit);

Thus, this new stack size limit is now directly passed from my perl script to my binary.

szabgab
  • 6,202
  • 11
  • 50
  • 64
Ledoc
  • 1
  • 1

1 Answers1

0

I've run into similar problems before. Maybe one way to solve this is to put the binary on a 'delayed start', so that it starts after everything else on your system is running. One way to do this is to put an at job in your /etc/rc.local script, to start the binary in X minutes.

mti2935
  • 11,465
  • 3
  • 29
  • 33
  • Thanks for your input but it is not suitable: the binary is launched through a web form when the users ask for it. It is not a system service but an ‘on demand’ execution. – Ledoc Nov 21 '13 at 10:14
  • OK, I re-read your question, and I see what you mean now. It sounds like the httpd service might be starting too soon, because when you restart it later, it solves the problem. Maybe starting the httpd service on a timed delay (using `at` in one of the rc scripts) would solve the problem? – mti2935 Nov 21 '13 at 10:42
  • Smart workaround but still not working: even with a httpd restart scheduled several minutes after the last rc script, the issue is still there. It appears that apache must be restarted manually (at the prompt) to operate correctly. – Ledoc Nov 21 '13 at 12:27
  • Strange. I wonder if this is a permissions issue. Is it possible that when you restart the httpd service manually, you are doing so as a different user than the user that the service runs under when the rc scripts start the service? Or, perhaps it's a relative path issue. Could there be something looking for a file in a relative path that is correct when you restart the service manually, but is broken when the rc script tries to start it? – mti2935 Nov 21 '13 at 14:00
  • Unfortunately no for all your suggestions: rc startup script and manual restart script are strictly the same and are both invoked by root user. All permissions are correct as well as paths. – Ledoc Nov 21 '13 at 15:11
  • How does the perl script execute the binary program? Using the backticks? using qx()? using system()? I'm wondering if the binary is throwing some error message, but you're not seeing it - i.e. perhaps because it's being sent to stderr, and you're not capturing what is being sent to stderr. Are you capturing what is being written by the binary program to both stdout and stderr? – mti2935 Nov 21 '13 at 16:18
  • The binary program is executed using system() and is fully log through stdout and stderr (it was heavily tested). I guess that my binary crashes instantly (segfault) at its very beginning. By the way, I tried several compilers, at different levels of debugging and -O optimization. Still the same issue with no message from my binary. The problem seems to be deep inside... something :-/ – Ledoc Nov 21 '13 at 18:07
  • So, you are able to make changes to the source code of the Fortran program, and recompile to create a new binary? In that case, how about adding statements to produce output to stderr at various points in the program, then logging the output of stderr, so that you can see where it's crashing? – mti2935 Nov 21 '13 at 19:03
  • Laborious but efficient. I'll try that. Thanks! – Ledoc Nov 21 '13 at 22:47
  • My issue is now resolved. See my edit. Thanks for your time and providing me with insights to fix this issue! – Ledoc Nov 22 '13 at 08:53
  • Interesting. So, the system wasn't picking up the instructions in your .bashrc script when it started the httpd service automatically at boot up. That makes sense, because the .bashrc script is only started when the shell starts - i.e. when you login. Thanks for posting the answer to this mystery. – mti2935 Nov 22 '13 at 12:27