2

I have php7 CLI daemon which serially parses json with filesize over 50M. I'm trying to save every 1000 entries of parsed data using a separate process with pcntl_fork() to mysql, and for ~200k rows it works fine.

Then I get pcntl_fork(): Error 35.

I assume this is happening because mysql insertion becomes slower than parsing, which causes more and more forks to be generated until CentOS 6.3 can't handle it any more.

Is there a way to catch this error to resort to single-process parsing and saving? Or is there a way to check child process count?

Artjom Kurapov
  • 6,115
  • 4
  • 32
  • 42
  • 35 is `EGAIN`, which means you've hit the process limit. Maybe you're not calling `pcntl_wait` to clean up processes when they're done? – Barmar Nov 22 '16 at 17:14
  • What is the max number of fork process limit in centos? because then it should be calculable, 200k records will spam +/- 200 forks. "Or is there a way to check child process count" this can be done manually by keeping the pids. A beter approach would be starting multiple daemons that pick up a piece of work or run it with proc_open http://php.net/proc_open – Sander Visser Nov 22 '16 at 17:15
  • So why would forking be the solution to the problem? You're parsing some data and then using a separate process to hit MySQL concurrently - why would that be solution for anything? What's wrong with using transactions within the same process? You're obviously thinking that if you fork into N processes, that it will be N times faster, except it won't. Now we have a problem with bad solution and an answer for that bad solution. If you want your inserts to be fast, group those 1000 rows within same transaction. That will spend the least amount of I/O for writing. – Mjh Nov 23 '16 at 11:26
  • @Mjh I'm already grouping data in memory for mass SQL inserts, but making sql insert takes time. I want this time instead to be spent on actual input stream processing. Offloading this to separate process works fine for me, achieving 1.5k inserts per second – Artjom Kurapov Nov 23 '16 at 20:59
  • Grouping it in memory servers literally - no purpose. Having two processes, one for input and one for inserts is the way to go about this. However, your project, your code, and ultimately your time. Worst part is that someone might think this is the way to go - it isn't. Good luck to you. – Mjh Nov 24 '16 at 08:30
  • It does serve a purpose. Doing 1 insert with 100 rows is faster than 100 inserts with 1 row. – Artjom Kurapov Nov 25 '16 at 11:41
  • This [stack overflow post is probably a better solution](https://stackoverflow.com/questions/9976441/terminating-zombie-child-processes-forked-from-socket-server/10114945#10114945) for servers who don't care about its forked children. – Richard Tyler Miles Jul 23 '21 at 06:31

1 Answers1

1

Here is the solution that I did based on @Sander Visser comment. Key part is checking existing processes and resorting to same process if there are too many of them

class serialJsonReader{

const MAX_CHILD_PROCESSES = 50;
private $child_processes=[]; //will store alive child PIDs

private function flushCachedDataToStore() {

//resort to single process
    if (count($this->child_processes) > self::MAX_CHILD_PROCESSES) {
        $this->checkChildProcesses();

        $this->storeCollectedData() //main work here
    }

//use as much as possible
    else {
        $pid = pcntl_fork();
        if (!$pid) {
            $this->storeCollectedData(); //main work here
            exit();
        }
        elseif ($pid == -1) {
            die('could not fork');
        }
        else {
            $this->child_processes[] = $pid;
            $this->checkChildProcesses();
        }
    }
}

private function checkChildProcesses() {
    if (count($this->child_processes) > self::MAX_CHILD_PROCESSES) {
        foreach ($this->child_processes as $key => $pid) {
            $res = pcntl_waitpid($pid, $status, WNOHANG);

            // If the process has already exited
            if ($res == -1 || $res > 0) {
                unset($this->child_processes[$key]);
            }
        }
    }
}
}
Artjom Kurapov
  • 6,115
  • 4
  • 32
  • 42
  • please note this [post too](https://stackoverflow.com/questions/9976441/terminating-zombie-child-processes-forked-from-socket-server/10114945#10114945) – Richard Tyler Miles Jul 23 '21 at 06:32