0

Using this code as an example to test:

$dom = new DOMDocument();
$dom->preserveWhiteSpace = TRUE;
@ $dom->loadHTML($html);

$mydocnodes = $dom->getElementsByTagName('*');

foreach($mydocnodes as $node) {

    $title_text = $node->textContent;
    $tag_text = $node->tagName;

print $title_text . " in a " . $tag_text . " and my next sibling is " . $node->nextSibling->tagName . "</br>";

}

When the HTML for this is all on one line such as:

<html><body><h1>hello</h1><p>I am text</p<p>I am text</p></body></html>

The nextSibling works fine. However when the html is formatted as below it does not work and the values are null. It appears as though a sibling has to be on the exact same line, not just at the same level in the DOM.

<html>
    <body>
        <h1>hello</h1>
        <p>I am text</p>
        <p>I am text</p>
        </body>
        </html>

Given most HTML is formatted across multiple lines, how can I load my HTML into the DomDocument so as to have the next and previous siblings work?

Many thanks!

  • You are, I believe, encountering DOM elements referred to as `"XML_TEXT_NODE"` - or, simply `blank spaces`. It is `XML_ELEMENT_NODE` that allow `nodeValue` to access the content of the node. In the single line HTML the nextSibling does refer to an element node whereas in the formatted HTML the nextSibling is referring to the whitespace – Professor Abronsius Jun 14 '23 at 08:05

0 Answers0