0

Parsing and Editing HTML in PHP can be done using the DOMdocument and DOMnode classes. Answered in another questions is how to turn

<div>text <p>test</p> more text</div>

Into:

<div>text <a>test</a> more text</div>

Or turn it into (PHP DOMDocument question: how to replace text of a node?)

<div>text <p>and</p> more text</div>

But, how do you replace the node all together into just text, and turn it into this?

<div>text and more text</div> 
Community
  • 1
  • 1
aphid
  • 1,135
  • 7
  • 20

1 Answers1

0

It took me some searching to find a generic method.

The clue is that text itself also consists of nodes. By default, when loading a document, each continguous block of text is replaced with a single 'text node' representing it. Thus the example HTML contains three nodes;

  • a text node
  • a <p> node (containing a text node)
  • another text node

To replace the p node, we create another text node. Then we get a list of three text nodes. Finally, to merge it back into one text node (matching a newly loaded document with the formatting of the replaced one) there is a function 'normalize' that recursively will 'clean up' after the edits, removing spurious nodes and merging adjacent text nodes.

$text = "<div>text <p>test</p> more text</div>";
$doc = \DOMDocument::loadHTML($text);
$node = $doc->getElementsByTagName("div")->item(0);
$child = $node->childNodes->item(1);
$newNode = $doc->createTextNode("and");
$node->replaceChild($newNode, $child);
$node->normalize();
// Check for correctness
echo $doc->saveHTML();
aphid
  • 1,135
  • 7
  • 20