4

I have the following PHP 5.6.19 code on a Ubuntu 14.04 server. This code simply connects to a MySQL 5.6.28 database, waits a minute, launches another process of itself, then exits.

Note: this is the full script, and it's purpose is to demonstrate the problem - it doesn't do anything useful.

class DatabaseConnector {
    const DB_HOST = 'localhost';
    const DB_NAME = 'database1';
    const DB_USERNAME = 'root';
    const DB_PASSWORD = 'password';

    public static $db;

    public static function Init() {
        if (DatabaseConnector::$db === null) {
            DatabaseConnector::$db = new PDO('mysql:host=' . DatabaseConnector::DB_HOST . ';dbname=' . DatabaseConnector::DB_NAME . ';charset=utf8', DatabaseConnector::DB_USERNAME, DatabaseConnector::DB_PASSWORD);
        }
    }
}

$startTime = time();

// ***** Script works fine if this line is removed.
DatabaseConnector::Init();

while (true) {
    // Sleep for 100 ms.
    usleep(100000);

    if (time() - $startTime > 60) {
        $filePath = __FILE__;
        $cmd = "nohup php $filePath > /tmp/1.log 2>&1 &";

        // ***** Script sometimes exits here without opening the process and without errors.
        $p = popen($cmd, 'r');

        pclose($p);

        exit;
    }
}

I start the first process of the script using nohup php myscript.php > /tmp/1.log 2>&1 &.

This process loop should go on forever but... based on multiple tests, within a day (but not instantly), the process on the server "disappears" without reason. I discovered that the MySQL code is causing the popen code to fail (the script exits without any error or output).

What is happening here?


Notes

  • The server runs 24/7.
  • Memory is not an issue.
  • The database connects correctly.
  • The file path does not contain spaces.
  • The same problem exists when using shell_exec or exec instead of popen (and pclose).

I also know that popen is the line that fails because I did further debugging (not shown above) by logging to a file at certain points in the script.

Code
  • 6,041
  • 4
  • 35
  • 75

4 Answers4

3

Is the parent process definitely exiting after forking? I had thought pclose would wait for the child to exit before returning.

If it isn't exiting, I'd speculate that because the mySQL connection is never closed, you're eventually hitting its connection limit (or some other limit) as you spawn the tree of child processes.

Edit 1

I've just tried to replicate this. I altered your script to fork every half-second, rather than every minute, and was able to kill it off within about 10 minutes.

It looks like the the repeat creation of child processes is generating ever more FDs, until eventually it can't have any more:

$ lsof | grep type=STREAM | wc -l
240
$ lsof | grep type=STREAM | wc -l
242
...
$ lsof | grep type=STREAM | wc -l
425
$ lsof | grep type=STREAM | wc -l
428
...

And that's because the child's inheriting the parent's FDs (in this case for the mySQL connection) when it forks.

If you close the mySQL connection before popen with (in your case):

DatabaseConnector::$db = null;

The problem will hopefully go away.

jstephenson
  • 2,170
  • 14
  • 14
  • It is - there's always at most one process when I check with `ps aux`. Also it also happens with `shell_exec` and `exec`. – Code May 11 '16 at 09:02
  • What are FDs? What does the lsof command do? I always prints 0 for me. – Code May 12 '16 at 02:31
  • @tumber033 'File descriptors' sorry, so open files/sockets/etc. There's a limit on the number of these which you can check with `ulimit -n`. `lsof` lists the currently open FDs — the grep in my example might be better as `grep php` or similar. – jstephenson May 12 '16 at 04:49
  • I think you found the problem. But what do I do if I still need the mysql connection after popen? – Code May 12 '16 at 11:10
  • Great. To my knowledge there's not really any way to stop this in PHP. Other solutions are outside the scope of the question really, but you could look in to creating the child process differently to try and avoid this (maybe via an 'immediate' cron), or serialize data read from the DB and pass it via stdin (look at proc_open) to the child for processing. Alternatively, if this isn't particularly high volume/performance sensitive, you could simply create another connection after spawning the child! – jstephenson May 12 '16 at 13:38
  • I notice that when I call `fclose(STDIN)` at the start of the script the FDs are not inherited. Any idea what's happening? – Code May 12 '16 at 14:55
  • @tumber033 Yes, that's quite an interesting little 'hack'. Because you close stdin on the first line, you free up the file descriptor numbered zero (which is stdin). When you then open the database connection, the system reuses FD 0 for that socket. Consequently when popen forks, STDIN for the child is this database connection, and you're able to close it. Not particularly elegant! – jstephenson May 12 '16 at 15:47
3

I had a similar situation using pcntl_fork() and a MySQL connection. The cause here is probably the same.

Background info

popen() creates a child process. The call to pclose() closes the communication channel and the child process continues to run until it exits. This is when the things start to go out of control.

When a child process completes, the parent process receives a SIGCHLD signal. The parent process here is the PHP interpreter that runs the code you posted. The child process is the one launched using popen() (it doesn't matter what command it runs).

There is a small thing here you probably don't know or you have found in the documentation and ignored it because it doesn't make much sense when one programs in PHP. It is mentioned in the documentation of sleep():

If the call was interrupted by a signal, sleep() returns a non-zero value.

The sleep() PHP function is just a wrapper of the sleep() Linux system call (and usleep() PHP function is a wrapper of the usleep() Linux system call.)

What is not told in the PHP documentation is clearly stated in the documentation of the system calls:

sleep() makes the calling thread sleep until seconds seconds have elapsed or a signal arrives which is not ignored.

Back to your code.

There are two places in your code where the PHP interpreter calls the usleep() Linux system function. One of them is clearly visible: your PHP code invokes it. The other one is hidden (see below).

What happens (the visible part)

Starting with the second iteration, if a child process (created using popen() on a previous iteration) happens to exit while the parent program is inside the usleep(100000) call, the PHP interpreter process receives the SIGCHLD signal and its execution resumes before the time being out. The usleep() returns earlier than expected. Because the timeout is short, this effect is not observable by the naked eye. Put 10 seconds instead of 0.1 seconds and you'll notice it.

However, apart from the broken timeout, this doesn't affect the execution of your code in a fatal manner.

Why it crashes (the invisible part)

The second place where an incoming signal hurts your programs execution is hidden deep inside the code of the PHP interpreter. For some protocol reasons, the MySQL client library uses sleep() and/or usleep() in several places. If the interpreter happens to be inside one of these calls when the SIGCHLD arrives, the MySQL client library code is resumed unexpectedly and, many times, it concludes with the erroneous status "MySQL server has gone away (error 2006)".

It's possible that your code ignores (or swallows) the MySQL error status (because it doesn't expect it to happen in that place). Mine didn't and I spent a couple of days of investigation to find out the facts summarized above.

A solution

The solution for the problem is easy (after you know all the internal details exposed above). It is hinted in the documentation quote above: "a signal arrives which is not ignored".

The signals can be masked (ignored) when their arrival is not desired. The PHP PCNTL extension provides the function pcntl_sigprocmask(). It wraps the sigprocmask() Linux system call that sets what signals can be received by the program from now on (in fact, what signals to be blocked).

There are two strategies you can implement, depending of what you need.

If your program needs to communicate with the database and be notified when the child processed complete then you have to wrap all your database calls within a pair of calls to pcntl_sigprocmask() to block then unblock the SIGCHLD signal.

If you doesn't care when the child processes complete then you just call:

pcntl_sigprocmask(SIG_BLOCK, array(SIGCHLD));

before you start creating any child process (before the while()). It makes your process ignore the termination of the child processes and lets it run its database queries without undesired interruption.

Warning

The default handling of the SIGCHLD signal is to call wait() in order to let the system cleanup after the completed child process. What happens if the signal is not handled (because its delivery is blocked) is explained in the documentation of wait():

A child that terminates, but has not been waited for becomes a "zombie". The kernel maintains a minimal set of information about the zombie process (PID, termination status, resource usage information) in order to allow the parent to later perform a wait to obtain information about the child. As long as a zombie is not removed from the system via a wait, it will consume a slot in the kernel process table, and if this table fills, it will not be possible to create further processes. If a parent process terminates, then its "zombie" children (if any) are adopted by init(1), which automatically performs a wait to remove the zombies.

In plain English, if you block the reception of SIGCHLD signal, then you have to call pcntl_wait() in order to cleanup the zombie child processes.

You can add:

pcntl_wait($status, WNOHANG);

somewhere inside the while loop (just before it ends, for example).

Community
  • 1
  • 1
axiac
  • 68,258
  • 9
  • 99
  • 134
  • So much detail in your answer, man. Bravo. Really shows how wrapping system functions without proper documentation is a big negative point in PHP. Also happens with some Windows Date functions, namely because of Daylight Savings Time issues. – henry700 May 14 '16 at 02:55
1

the script exits without any error or output

Not surprising when there's no error checking in the code. However if it really is "crashing", then:

  • if the cause is trapped by the PHP runtime then it will be trying to log an error. Have you tried delibertely creating an error scenario to varify that the reorting/logging is working as you expect?

  • if the error is not trapped by the PHP runtime, the the OS should be dumping a corefile - have you checked the OS config? Looked for the core file? Analyzed it?

$cmd = "nohup php $filePath > /tmp/1.log 2>&1 &";

This probably doesn't do what you think it does. When you run a process in the background with most versions of nohup, it still retains a relationship with the parent process; the parent cannot be reaped until the child process exits - and a child is always spawning another child before it does.

This is not a valid way to keep your code running in the background / as a daemon. What the right approach is depends on what you are trying to achieve. Is there a specific reason for attempting to renew the process every 60 seconds?

(You never explicitly close the database connection - this is less of an issue as PHP should do this when exit is invoked).

You might want to read this and this

symcbean
  • 47,736
  • 6
  • 59
  • 94
  • Putting it simply, this process is mean to: Wait (poll) for data in the db, then spawn a new process to handle (or wait for) the next batch of data simultaneously, then process the fetched data (possibly time consuming), then exit. If there's no data for some time, it relaunches to avoid potential memory leaks. What approach do you recommend? – Code May 11 '16 at 09:11
  • Preferred approach would be to do the processing synchronously triggerred by the thing which is creating the data in the database. Failing that, daemonize the script properly (consider using DJB's daemontools if you have stability issues) with fork and setsid(). Failing that run as a cron job (with concurrency controls) – symcbean May 11 '16 at 11:32
0

I suggest that process doesn't exit after pclose. In this case every process holds it's own connection to db. After some time connectons limit of MySQL is reached and new connection fails. To understand what's going on - add some logs before and after strings DatabaseConnector::Init(); and pclose($p);

porfirion
  • 1,619
  • 1
  • 16
  • 23