3

I am trying to code php threads. I'm creating a DOMDocument in the constructor but for some reason the newly created document, although assigned to a member variable, disappears.

Code-listing 1:

    class workerThread extends Thread {

        private $document;
        private $i;
        public function __construct($i){
            $this->document = new DOMDocument();
            $this->i = $i;
        }

        public function run(){
            try{
                $root = $this->document->createElement("Root");//can't fetch this
                $this->document->appendChild($root);
            }catch(RuntimeException $e){
                return false;
            }
        }

        public function getRoot(){
            return $this->document->documentElement;
        }
    }

    for($i=0;$i<10;$i++){
        $workers[$i]    = new workerThread($i);
        $workers[$i]->start();
    }

    for($i=0;$i<10;$i++){
        $workers[$i]->join();
    }

?>

I tried instanciating the new DOMDocument outside the constructor and use them as an argument in the constructor like in code-listing 2 but it doesn't change a thing.

Code-listing 2:

for($i=0;$i<10;$i++){
    $documents[$i]  = new DOMDocument();
    $workers[$i]    = new workerThread($documents[$i], $i);
    $workers[$i]->start();
}

The constructor looks like this:

Code-listing 3:

public function __construct($doc, $i){
    $this->document = $doc;
    $this->i = $i;
}

I want to be able to create the DOMDocument outside or inside the thread (whether it's in the constructor, the run function or another function), use it in the run function and retrieve it's root from outside the thread that processed it.

Paiku Han
  • 581
  • 2
  • 16
  • 38

2 Answers2

2
  class workerThread extends Thread {

        private $document;
        private $i;
        public function __construct($i){
            $this->document = new DOMDocument();
            $this->i = $i;
        }

        public function run(){
            try{
                $root = $this->document->createElement("Root");//can't fetch this
                $this->document->appendChild($root);
                $this->xml = $this->document->saveXML();  // <---
            }catch(RuntimeException $e){
                return false;
            }
        }

        public function getRoot(){
            return $this->document->documentElement;
        }
    }

    for($i=0;$i<10;$i++){
        $workers[$i] = new workerThread($i);
        $workers[$i]->start();
    }

    for($i=0;$i<10;$i++){
        $workers[$i]->join();
        $workers[$i]->xml; // <---
    }

Let’s start off easy with a simple web crawling example.

<?php
class SearchGoogle extends Thread
{
    public function __construct($query)
    {
        $this->query = $query;
    }

    public function run()
    {
        $this->html = file_get_contents('http://google.fr?q='.$this->query);
    }
}

Once join is called, we can be sure the class is holding our results:

$job = new SearchGoogle('cats');
$job->start();

// Wait for the job to be finished and print results
$job->join();
echo $job->html;

Please read full documents https://blog.madewithlove.be/post/thread-carefully/

Scaffold
  • 587
  • 6
  • 14
1

The problem is that the DOMDocument class cannot be reliably serialized. As per the manual on the serialize function:

Note that many built-in PHP objects cannot be serialized. However, those with this ability either implement the Serializable interface or the magic __sleep() and __wakeup() methods. If an internal class does not fulfill any of those requirements, it cannot reliably be serialized.

As we can see, the DOMDocument class (and by extension, its parent class, DOMNode) does not implement the serializable interface, nor does it implement the required sleep and wakeup methods. Thus, we should not be serializing the DOMDocument class.

So, how does serialization relate to this problem here then? Well, pthreads internally serializes all non-Threaded classes (except for Closure) and arrays for properties of Threaded objects. This means that upon assigning the new DOMDocument object to the workerThread::$document property, pthreads internally serializes this property and stores it as a string. When you fetch this property, pthreads unserializes it for you to use as normal. (This is just one of many hacks pthreads has to employ in order to work safely in a multithreaded environment.)

Thus, it does not matter where you assign this property - inside the constructor, outside of the class, or inside of the workerThread::run method - you're still going to get the same problem.

So, what's the solution then? Well, there's a couple of things you could do. You could either save only the generated XML to the workerThread::$document property, or you could extend the DOMDocument class with a serializable-friendly class, and use that for the workerThread::$document property. Ultimately, both solutions will require you to convert the DOMDocument object to XML, since that is the only safe way to store it as a Threaded property (directly or indirectly).

Applying the first (and simpler) solution, here is a working version of your workerThread class:

class workerThread extends Thread
{
    private $document;
    private $i;

    public function __construct($i)
    {
        $this->i = $i;
    }

    public function run()
    {
        $document = new DOMDocument();

        try {
            $root = $document->createElement("Root");
            $document->appendChild($root);
        } catch (RuntimeException $e) {
            return false;
        }

        $this->document = $document->saveXML();
    }

    public function getRoot()
    {
        $document = new DOMDocument();
        $document->loadXML($this->document);

        return $document->documentElement;
    }
}
tpunt
  • 2,552
  • 1
  • 12
  • 18