4

I basically want to get ALL the images in any website using DOMDocument. but then i cant even load my html due to some reasons I dont know yet.

$url="http://<any_url_here>/";
$dom = new DOMDocument();
@$dom->loadHTML($url); //i have also tried removing @
$dom->preserveWhiteSpace = false;
$dom->saveHTML();
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) 
{
echo $image->getAttribute('src');
}

what happens is nothing gets printed . or did I do something wrong with the code?

Leonid
  • 340
  • 1
  • 3
  • 16
  • the reason you dont get a error message is probably this line `@$dom->loadHTML($url);` in php the '@' hides all error messages for that function. – S.Visser Apr 09 '13 at 07:32
  • i removed it ages ago but still I got no results... – Leonid Apr 09 '13 at 07:34
  • You dont get an result because `$dom->loadHTML()` expects html. You give an it an url, you first need to get the html of the page you want to parse. You can use `file_get_contents()` for that. ( See answer ) – S.Visser Apr 09 '13 at 07:36
  • I added $html = file_get_contents("http://sitehere/"); then loaded the html file $dom->loadHTML($html); now it gave me an error. Error: DOMDocument::loadHTML(): Attribute class redefined in Entity – Leonid Apr 09 '13 at 08:02

1 Answers1

16

You don't get a result because $dom->loadHTML() expects html. You give it an url, you first need to get the html of the page you want to parse. You can use file_get_contents() for that.

I used this in my image grab class. Works fine for me.

$html = file_get_contents('http://www.google.com/');
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
  echo $image->getAttribute('src');
}
Leonid
  • 340
  • 1
  • 3
  • 16
S.Visser
  • 4,645
  • 1
  • 22
  • 43
  • I have an Attribute class redefined in entity error now. `$dom = new DOMDocument; $htmls = file_get_contents("http://philcooke.com/inspiration-happens-but-the-best-ideas-take-time/"); $dom->loadHTML($htmls);` – Leonid Apr 09 '13 at 08:34
  • your answer was almost right. just add a "@" character before `$dom->loadHTML($html)` – Leonid Apr 09 '13 at 08:40
  • 1
    An alternative to append the '@' before the `$dom->loadHTML($html)` to suppress the error, you could use tidy to clean the html first. ```$tidy = tidy_parse_string($html); $html = $tidy->html()->value;``` But maybe this is too much. – Kurt Zhong Nov 28 '13 at 08:09