3

I took the most basic demo of pthreads PHP7 extension that uses Pool class (this demo https://github.com/krakjoe/pthreads#polyfill) and extended it a little so I can grab results from the thread (or at least I think I can):

$pool = new Pool(4);

foreach (range(1, 8) as $i) {
    $pool->submit(new class($i) extends Threaded
    {
        public $i;
        private $garbage = false;

        public function __construct($i)
        {
            $this->i = $i;
        }

        public function run()
        {
            echo "Hello World\n";
            $this->result = $this->i * 2;
            $this->garbage = true;
        }

        public function isGarbage() : bool
        {
            return $this->garbage;
        }
    });
}

while ($pool->collect(function(Collectable $task) {
    if ($task->isGarbage()) {
        echo $task->i . ' ' . $task->result . "\n";
    }
    return $task->isGarbage();
})) continue;

$pool->shutdown();

What's confusing me is that it sometimes doesn't get the result for all tasks:

Hello World
Hello World
Hello World
Hello World
Hello World
1 2
2 4
Hello World
Hello World
3 6
Hello World
7 14
4 8
8 16

Now two lines with 5 10 and 6 12 are missing but I don't understand why. This happens only sometimes (maybe 1/10 runs).

It looks like the original demo is for the older version of pthreads because there's Collectable interface which is now automatically implemented by Threaded if I'm not mistaken.

Then the readme says:

The Pool::collect mechanism was moved from Pool to Worker for a more robust Worker and simpler Pool inheritance.

So I guess I'm doing something wrong.

Edit: I took the example from How does Pool::collect works? and updated it to work with latest pthreads and current PHP7 but the result is the same. It looks like it's not able to collect results from the last threads that are executed.

$pool = new Pool(4);

while (@$i++<10) {
    $pool->submit(new class($i) extends Thread implements Collectable {
        public $id;
        private $garbage;

        public function __construct($id) {
            $this->id = $id;
        }

        public function run() {
            sleep(1);
            printf(
                "Hello World from %d\n", $this->id);
            $this->setGarbage();
        }

        public function setGarbage() {
            $this->garbage = true;
        }

        public function isGarbage(): bool {
            return $this->garbage;
        }

    });
}

while ($pool->collect(function(Collectable $work){
    printf(
        "Collecting %d\n", $work->id);
    return $work->isGarbage();
})) continue;

$pool->shutdown();

This outputs the following which is clearly not collecting all threads:

Hello World from 1
Collecting 1
Hello World from 2
Collecting 2
Hello World from 3
Collecting 3
Hello World from 4
Collecting 4
Hello World from 5
Collecting 5
Hello World from 6
Hello World from 7
Collecting 6
Collecting 7
Hello World from 8
Hello World from 9
Hello World from 10
Community
  • 1
  • 1
martin
  • 93,354
  • 25
  • 191
  • 226

2 Answers2

1

As you have quite correctly noted, the code you have copied targets pthreads v2 (for PHP 5.x).

The problem boils down to the fact that the garbage collector in pthreads is not deterministic. This means it will not behave predictably, and so it cannot be reliably used in order to fetch data from the tasks that have been executed by the pool.

One way you could fetch this data would be to pass in Threaded objects into the tasks being submitted to the pool:

<?php

$pool = new Pool(4);
$data = [];

foreach (range(1, 8) as $i) {
    $dataN = new Threaded();
    $dataN->i = $i;

    $data[] = $dataN;

    $pool->submit(new class($dataN) extends Threaded {
        public $data;

        public function __construct($data)
        {
            $this->data = $data;
        }

        public function run()
        {
            echo "Hello World\n";
            $this->data->i *= 2;
        }
    });
}

while ($pool->collect());

$pool->shutdown();

foreach ($data as $dataN) {
    var_dump($dataN->i);
}

There are a few things to note about the above code:

  • Collectable (which is now an interface in pthreads v3) is implemented by the Threaded class already, so there's no need to implement it yourself.
  • Once a task has been submitted to the pool, it is already considered to be garbage, and so there is no need to handle this part yourself. Whilst you still have the ability to override the default garbage collector, this should not be needed in the vast majority of cases (including yours).
  • I still invoke the collect method (in a loop that blocks the main thread until all tasks have finished executing) so that the tasks can be garbage collected (using pthreads' default collector) to free up memory whilst the pool is executing tasks.
tpunt
  • 2,552
  • 1
  • 12
  • 18
0

I had a similar problem, where the collecting would return true instantly. Turns out that collect would return when all work was in process and not when all work was completed. It wouldn't even handle the task, so collecting was never returned.

So if I had a poolsize of 4 and submitted just 3 tasks, collect would never run and we would continue immediately. Example:

define ("CRLF", "\r\n");

class AsyncWork extends Thread {
  private $done = false;
  private $id;

  public function __construct($id) {
    $this->id = $id;
  }

  public function id() {
    return $this->id;
  }

  public function isCompleted() {
    return $this->done;
  }

  public function run() {
    echo '[AsyncWork] ' . $this->id . CRLF;
    sleep(rand(1,5));
    echo '[AsyncWork] sleep done ' . $this->id . CRLF;
    $this->done = true;
  }
}

$pool = new Pool(4);

for($i=1;$i<=3;$i++) {
  $pool->submit(new AsyncWork($i));
}

while ($pool->collect(function(AsyncWork $work){
    echo 'Collecting ['.$work->id().']: ' . ($work->isCompleted()?1:0) . CRLF;
    return $work->isGarbage();
})) continue;

echo 'ALL DONE' . CRLF;

$pool->shutdown();

would output

[AsyncWork] 1
[AsyncWork] 2
ALL DONE
[AsyncWork] 3
[AsyncWork] sleep done 2
[AsyncWork] sleep done 3
[AsyncWork] sleep done 1

If I changed above code to have more work then the poolsize, it would collect untill all work was in process. EG:

for($i=1;$i<=10;$i++) {
  $pool->submit(new AsyncWork($i));
}

//results:

[AsyncWork] 1
[AsyncWork] 2
[AsyncWork] 3
[AsyncWork] 4
[AsyncWork] sleep done 4
[AsyncWork] 8
Collecting [4]: 1
[AsyncWork] sleep done 1
Collecting [1]: 1
[AsyncWork] 5
[AsyncWork] sleep done 3
Collecting [3]: 1
[AsyncWork] 7
[AsyncWork] sleep done 2
Collecting [2]: 1
[AsyncWork] 6
[AsyncWork] sleep done 6
Collecting [6]: 1
[AsyncWork] 10
[AsyncWork] sleep done 7
Collecting [7]: 1
[AsyncWork] sleep done 8
Collecting [8]: 1
[AsyncWork] sleep done 5
Collecting [5]: 1
ALL DONE
[AsyncWork] 9
[AsyncWork] sleep done 9
[AsyncWork] sleep done 10

As you can see, it never collects the last tasks and it returns before the work is done.

The only way I could solve this, was to handle collecting myself, by keeping track of the tasklist.

$pool = new Pool(4);

$worklist = [];
for($i=1;$i<=10;$i++) {
  $work = new AsyncWork($i);
  $worklist[] = $work;
  $pool->submit($work);
}

do {
  $alldone = true;
  foreach($worklist as $i=>$work) {
    if (!$work->isCompleted()) {
      $alldone = false;
    } else {
      echo 'Completed: '. $work->id(). CRLF;
      unset($worklist[$i]);
    }
  }

  if ($alldone) {
    break;
  }
} while(true);

while ($pool->collect(function(AsyncWork $work){
    echo 'Collecting ['.$work->id().']: ' . ($work->isCompleted()?1:0) . CRLF;
    return $work->isGarbage();
})) continue;

echo 'ALL DONE' . CRLF;

$pool->shutdown();

This was the only way I could make sure ALL DONE was only called when it was in fact, all done.

[AsyncWork] 1
[AsyncWork] 2
[AsyncWork] 3
[AsyncWork] 4
[AsyncWork] sleep done 1
[AsyncWork] 5
Completed: 1
[AsyncWork] sleep done 2
Completed: 2
[AsyncWork] 6
[AsyncWork] sleep done 4
[AsyncWork] 8
Completed: 4
[AsyncWork] sleep done 6
[AsyncWork] sleep done 3
[AsyncWork] 7
Completed: 6
Completed: 3
[AsyncWork] sleep done 5
Completed: 5
[AsyncWork] 10
[AsyncWork] 9
[AsyncWork] sleep done 9
Completed: 9
[AsyncWork] sleep done 8
Completed: 8
[AsyncWork] sleep done 7
Completed: 7
[AsyncWork] sleep done 10
Completed: 10
Collecting [1]: 1
Collecting [5]: 1
Collecting [9]: 1
Collecting [2]: 1
Collecting [6]: 1
Collecting [10]: 1
Collecting [3]: 1
Collecting [7]: 1
Collecting [4]: 1
Collecting [8]: 1
ALL DONE
Hugo Delsing
  • 13,803
  • 5
  • 45
  • 72