I had a similar situation using pcntl_fork()
and a MySQL connection. The cause here is probably the same.
Background info
popen()
creates a child process. The call to pclose()
closes the communication channel and the child process continues to run until it exits. This is when the things start to go out of control.
When a child process completes, the parent process receives a SIGCHLD
signal. The parent process here is the PHP interpreter that runs the code you posted. The child process is the one launched using popen()
(it doesn't matter what command it runs).
There is a small thing here you probably don't know or you have found in the documentation and ignored it because it doesn't make much sense when one programs in PHP. It is mentioned in the documentation of sleep()
:
If the call was interrupted by a signal, sleep()
returns a non-zero value.
The sleep()
PHP function is just a wrapper of the sleep()
Linux system call (and usleep()
PHP function is a wrapper of the usleep()
Linux system call.)
What is not told in the PHP documentation is clearly stated in the documentation of the system calls:
sleep()
makes the calling thread sleep until seconds seconds have elapsed or a signal arrives which is not ignored.
Back to your code.
There are two places in your code where the PHP interpreter calls the usleep()
Linux system function. One of them is clearly visible: your PHP code invokes it. The other one is hidden (see below).
What happens (the visible part)
Starting with the second iteration, if a child process (created using popen()
on a previous iteration) happens to exit while the parent program is inside the usleep(100000)
call, the PHP interpreter process receives the SIGCHLD
signal and its execution resumes before the time being out. The usleep()
returns earlier than expected. Because the timeout is short, this effect is not observable by the naked eye. Put 10 seconds instead of 0.1 seconds and you'll notice it.
However, apart from the broken timeout, this doesn't affect the execution of your code in a fatal manner.
Why it crashes (the invisible part)
The second place where an incoming signal hurts your programs execution is hidden deep inside the code of the PHP interpreter. For some protocol reasons, the MySQL client library uses sleep()
and/or usleep()
in several places. If the interpreter happens to be inside one of these calls when the SIGCHLD
arrives, the MySQL client library code is resumed unexpectedly and, many times, it concludes with the erroneous status "MySQL server has gone away (error 2006)".
It's possible that your code ignores (or swallows) the MySQL error status (because it doesn't expect it to happen in that place). Mine didn't and I spent a couple of days of investigation to find out the facts summarized above.
A solution
The solution for the problem is easy (after you know all the internal details exposed above). It is hinted in the documentation quote above: "a signal arrives which is not ignored".
The signals can be masked (ignored) when their arrival is not desired. The PHP PCNTL extension provides the function pcntl_sigprocmask()
. It wraps the sigprocmask()
Linux system call that sets what signals can be received by the program from now on (in fact, what signals to be blocked).
There are two strategies you can implement, depending of what you need.
If your program needs to communicate with the database and be notified when the child processed complete then you have to wrap all your database calls within a pair of calls to pcntl_sigprocmask()
to block then unblock the SIGCHLD
signal.
If you doesn't care when the child processes complete then you just call:
pcntl_sigprocmask(SIG_BLOCK, array(SIGCHLD));
before you start creating any child process (before the while()
).
It makes your process ignore the termination of the child processes and lets it run its database queries without undesired interruption.
Warning
The default handling of the SIGCHLD
signal is to call wait()
in order to let the system cleanup after the completed child process. What happens if the signal is not handled (because its delivery is blocked) is explained in the documentation of wait()
:
A child that terminates, but has not been waited for becomes a "zombie". The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child. As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes. If a parent process terminates, then its "zombie" children (if any) are adopted by init(1)
, which automatically performs a wait to remove the zombies.
In plain English, if you block the reception of SIGCHLD
signal, then you have to call pcntl_wait()
in order to cleanup the zombie child processes.
You can add:
pcntl_wait($status, WNOHANG);
somewhere inside the while
loop (just before it ends, for example).