0

Task - scrape content inside a DIV tag with an ID and then return the XHTML. I am using 'PHP Simple HTML DOM Parser'

Example simplified code:

<html><head></head>
<body>
<h1>Head</h1>
<div class="page">
<div id="content">
<h2>Section head</h2>
<p>Text</p>
</div>
<div id="footer">Footer text</div>
</div>
</body>
</html>

I can get the content OK with:

$content = $html->find('#content');

$content is now a simpleDOM object an array (corrected).

How do I convert that back to XHTML so I just have:

<div id="content">
<h2>Section head</h2>
<p>Text</p>
</div>

Thanks

mjpg
  • 23
  • 5

2 Answers2

0

Have you tried:

// Dumps the internal DOM tree back into string 
$str = $content->save();

Reference: http://simplehtmldom.sourceforge.net/manual.htm

  • I have corrected my question. $html->find('#content') returns an array NOT an object, so cannot use $content->save(); – mjpg Jun 07 '13 at 19:57
0

This worked OK:

// Sample HTML string
$html_str = '<html><head></head><body><h1>Head</h1><div class="page"><div id="content"><h2>Section head</h2><p>Text</p></div><div id="footer">Footer text</div></div></body></html>';

// Create new DOM object
$dom = new DOMDocument();

// $html_str is HTML (can load from URL, if your host allows)
$dom->loadHTML($html_str);

// Get DIV id="content"
$element = $dom->getElementById('content');

// use save XML as input is XHTML.
echo $dom->saveXML($element);

// cleanup to prevent memory leak
$dom->clear(); 
unset($dom);

If using in another template, you will have to add correct charset to display characters properly

mjpg
  • 23
  • 5