0

I want to extract Microdata values.

I use a Yii App runned from the command line, using Putty.

The following code does not generate any output;

in $this->input->html i store an entire html document source;

I think that somewhere at $content = new DOMXPath($dom); something cracks; dont know why

If anyone knows, please give a hand;

$dom = new DOMDocument();

$html = $this->input->html;

$html = <<<HTML
echo $html;
HTML;

        @$dom->loadHTML($html);

        echo $html;

        $content = new DOMXPath($dom);

        print_r($content);

        // find price
        try {
            echo '1'.$this->getMicrodataAttribute($content, 'http://data-vocabulary.org/Offer', 'price');
            $this->output->productPrice = $this->getMicrodataAttribute($content, 'http://data-vocabulary.org/Offer', 'price');
            //echo 'result output product price: '.$this->output->productPrice.PHP_EOL;
        } catch (Exception $e) {

        }
        // find title
        try {

            $this->output->productTitle = $this->getMicrodataAttribute($content, 'http://data-vocabulary.org/Product', 'name');
            if (!$this->output->productTitle)
                if (preg_match("#<title>(.+)<\/title>#iU", $this->input->html, $t)) {
                    $this->output->productTitle = trim($t[1]);
                }
        } catch (Exception $e) {

        }

and this is the function that should extract the microdata values:

public function getMicrodataAttribute($content, $itemtype, $itemprop) {
    $tags = $content->query("//*[@itemtype=\"$itemtype\"]//*[@itemprop=\"$itemprop\"]");
    //print_r($tags);
    if ($tags) {
        foreach ($tags as $tag) {
            //die('dd');
            if (!$tag->getAttribute('content')) {
                return $tag->nodeValue;
            }
            return $tag->getAttribute('content');
        }
    }
    return null;
}
Grampa
  • 1,623
  • 10
  • 25
Ionut Flavius Pogacian
  • 4,750
  • 14
  • 58
  • 100

1 Answers1

1

I think the code where you get your html text into a DOMDocument is a bit convoluted. Maybe there's an error hiding in there. Try this:

$dom = new DOMDocument();
$dom->loadHTML( $this->input->html );
$content = new DOMXPath($dom);
print_r($content);

Note that I removed the @ from the loadHTML method. With this, you can see if it throws any errors.

Grampa
  • 1,623
  • 10
  • 25
  • so, after a day of code inspection, line by line, i found the error; it was somewhere else, far away from this code; this code works great; 10x for answering – Ionut Flavius Pogacian Aug 23 '12 at 06:49