2

I am trying to parse a folder full of .htm files. All these files contain 1 specific element that needs to be removed. It's a td element with class="hide". So far, this is my code. $dir. entry is the full path to the file.

$page = ($dir . $entry);
$this->domDoc->loadHTMLFile($page);
// Use xpath query to find the menu and remove it
$nodeList = $xpath->query('//td[@class="hide"]');

Unfortunately, this is where things already go wrong. If I do a var_dump of the node list, I get the following:

object(DOMNodeList)#5 (0) { } 

Just so you folks get an idea of what I'm trying to select, here's an excerpt:

<td width="160" align="left" valign="top" class="hide">
    lots of other TD's and content here
</td>

Does anybody see anything wrong with what I've come up with so far?

Jon7
  • 7,165
  • 2
  • 33
  • 39
Jens Eeckhout
  • 155
  • 3
  • 10

3 Answers3

6

Is your initial file xhtml (i.e. with <html xmlns="http://www.w3.org/1999/xhtml">)? If so then your elements will be namespaced and you'll need to set up a prefix mapping using $xpath->registerNamespace and then use this prefix in the expression

$xpath->registerNamespace('xhtml', 'http://www.w3.org/1999/xhtml');
$nodeList = $xpath->query('//xhtml:td[@class="hide"]');
Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
  • this was the issue for me, I actually simply disabled the namespaces entirely while using this library https://github.com/Masterminds/html5-php – Timo Huovinen Aug 14 '15 at 10:20
  • Same for me (processing PHPUnit Coverage xml) ```php $xml = new DOMDocument; $xml->preserveWhiteSpace = false; $xml->load($coverageIndex); $xml = new DOMXPath($xml); $xml->registerNamespace('phpunit', 'https://schema.phpunit.de/coverage/1.0'); $items = $xml->query('//phpunit:build'); ``` – Adam Mar 30 '23 at 06:41
5

Var dumping an xpath node list object doesn't show anything. Var dump the node list's length.

var_dump($nodeList->length);

If the value is over 0, then you can iterate over it using foreach:

foreach($nodeList as $node)var_dump($node->tagName);

Hope this helps.

For further clarification, here is a full working code snippet:

<?php
$html = <<<END
<html>
    <body>
        <td>

        </td>
        <td class="hide"></td>
        <td class="hide"></td>
    </body>
</html>
END;
$dom = new DOMDocument;
$dom->loadHtml($html);
$xpath = new DOMXpath($dom);
$nodeList = $xpath->query('//td[@class="hide"]');
// Shows a blank object
var_dump($nodeList);
// Shows 2
var_dump($nodeList->length);
// Echo out all the tag names.
foreach($nodeList as $node){
    echo $node->tagName . "\n";
}
?>
Kyle
  • 3,935
  • 2
  • 30
  • 44
  • 1
    You're absolutely right about the var_dump. I've changed that now. I've also checked my code -yet- again, and I see no difference compared to your snippet. Thank you for your reply though. Unfortunately I still get no output (the length of the nodelist returns `int(0)`. – Jens Eeckhout Oct 02 '12 at 14:17
3

Maybe you have more then one class in the class attribute of your td element:

<td class="hide anotherclass">

So '//td[@class="hide"]' would only match:

<td class="hide">

Try it like this to see if it contains the hide class you are looking for:

$nodeList = $xpath->query('//td[contains(@class,"hide")]');

Check out this blog post: XPath: Select element by class

Bogdan
  • 43,166
  • 12
  • 128
  • 129
  • Good advice. I double-checked this, but "hide" really is the only class the TD has. Still, thank you, interesting reply. – Jens Eeckhout Oct 02 '12 at 14:16