5

As all of you know when you fork the child gets a copy of everything, including file and network descriptors - man fork.

In PHP, when you use pcntl_fork all of your connections created with mysql_connect are copied and this is somewhat of a problem - php docs and SO question. Common sense in this situation says close the parent connection, create new and let the child use the old one. But what if said parent needs create many children ever few seconds? In that case you end up creating loads of new connections - one for every bunch of forks.

What does that mean in code:

while (42) {

  $db = mysql_connect($host, $user, $pass);

  // do some stuff with $db
  // ...

  foreach ($jobs as $job) {
        if (($pid = pcntl_fork()) == -1) {
            continue;
        } else if ($pid) {
            continue;
        }
    fork_for_job($job);
  }

  mysql_close($db);
  wait_children();
  sleep(5);
}

function fork_for_job($job) {

  // do something. 
  // does not use the global $db 
  // ...

  exit(0);
}

Well, I do not want to do that - thats way too many connections to the database. Ideally I would want to be able to achieve behaviour similar to this one:

$db = mysql_connect($host, $user, $pass);

while (42) {

  // do some stuff with $db
  // ...

  foreach ($jobs as $job) {
        if (($pid = pcntl_fork()) == -1) {
            continue;
        } else if ($pid) {
            continue;
        }
    fork_for_job($job);
  }

  wait_children();
  sleep(5);
}

function fork_for_job($job) {

  // do something
  // does not use the global $db 
  // ...

  exit(0);
}

Do you think it is possible?

Some other things:

  • This is php-cli script
  • I've tried using mysql_pconnect in the first example but as far as I can tell there is no difference - the mysql server receives as much new connections. Maybe that's because it is cli and pconnect does not work as it was in mod_php. As Marc has noticed - pconnect in php-cli does not make sense.
Cœur
  • 37,241
  • 25
  • 195
  • 267
doycho
  • 53
  • 1
  • 5
  • 1
    persistent connections only persist if there's something to keep the connection open. in the case of mod_php, PHP stays active inside the webserver and can hold open the connection. On the CLI, there's nothing left after the script exits, so the connection would close regardless. – Marc B Apr 20 '11 at 16:02
  • You didn't set up forking correctly, you're exhausting MySQL connection pool. Maybe checking the following example and modifying accordingly might help: http://stackoverflow.com/questions/5573214/php-shared-block-memory-and-fork – Michael J.V. Apr 20 '11 at 16:04

3 Answers3

2

The only thing you could try, is to let your children wait until each other child has finished its job. This way you could use the same database connection (provided there aren't any synchronization issues). But of course you'll have a lot of processes, which is not very good too (in my experience PHP has quite a big memory usage). If having multiple processes accessing the same database connection is not a problem, you could try to make "groups" of processes which share a connection. So you don't have to wait until each job finished (you can clean up when the whole group finished) and you don't have a lot of connections either..

You should ask yourself whether you really need a database connection for your worker processes. Why not let the parent fetch the data and write your results to a file?

If you do need the connection, you should consider using another language for the job. PHPs cli itself is not a "typical" use case (it was added in 4.3) and multiprocessing is more of a hack than a supported feature.

svens
  • 11,438
  • 6
  • 36
  • 55
  • The problem is not "I want to use the same connection in the children to save them from connection on their own" but rather "I don't want to create new connection for the parent after each fork because no one else uses it anyway and I end up reconnecting every pass of the main loop for no good reason." Maybe the examples were misleading. I've edited them a bit. – doycho Apr 21 '11 at 08:07
  • 1
    You could still use the same approach; make groups (fetch the work for multiple children, fork afterwards and throw away the connection). There's no proper solution to your problem, you can only make the overhead smaller (using "standard" PHP). If you need no database connection for your children you should think about starting independent children (i.e. starting new processes instead of forking), depending on your input data you can pass it as an argument or (worst case) in a file. But this might be even slower than opening a new database connection every few children. – svens Apr 29 '11 at 11:34
  • Well, thank you. As far as I am concerned this answers my question :) – doycho May 04 '11 at 10:43
0

My advice (from personal experience on the same issue) is to close the connection before pcntl_fork() then open new connections in parent and/or the child process as needed.

If you open a new connection in the parent process then you have to block the SIGCHLD signal (using pcntl_sigprocmask(SIG_BLOCK, array(SIGCHLD)). No special care is needed in the children processes (except when they also launch their own children, becoming parents this way.)

SIGCHLD is a signal that is received by the parent process when one of its children completes.

During the communication with the server, the MySQL client library uses nanosleep() to suspend the execution of the program for some amounts of time. The sleep() functions return when the time passes but they also return before the time passes if the process receives a signal while it is suspended.

When nanosleep() returns because of a signal (i.e. before enough time has passed), the MySQL library gets confused and reports the error "MySQL server has gone away" and the connection cannot be used any more. It is a false alarm, the MySQL server is still there waiting for queries but the client code is fooled by the signal arrived at the wrong moment.

If you are interested in receiving the SIGCHLD signal then you can block it before running a MySQL query then unblock it again (to avoid it being received during the communication with MySQL server.

Also read this answer and this answer I wrote on similar questions (it's the same information, but with more details and explanation.)

axiac
  • 68,258
  • 9
  • 99
  • 134
0

If the child calls exec() or _exit() fairly quickly, you're alright. The problem is if the child sticks around and holds on to copies of your file descriptors.

You could also use posix_spawn if PHP has an API for that. That might work well.

MarkR
  • 62,604
  • 14
  • 116
  • 151