1

I use simple_html_dom to get site's images. But sometimes, the image's link are not prefixed with the full domain URI, e.g. with http://example.com. They appear as something like

  • images/_home-ss-21.jpg
  • /_home-ss-22b.jpg
  • ./_1249a7s.png or
  • ../../../a19489s_20110412.jpeg.

How to can I convert these URIs to absolute URIs including the protocol and domain information.

<?php
header('Content-type:text/html; charset=utf-8');
require_once 'simple_html_dom.php';
$v = 'http://www.typepad.com/';
$html = file_get_html($v);
foreach($html->find('img') as $element) {
    echo $element->src.'<hr />';   
}
?>
Alex Gyoshev
  • 11,929
  • 4
  • 44
  • 74
yuli chika
  • 9,053
  • 20
  • 75
  • 122

3 Answers3

2

Inside your foreach you can try the following to build the URL to the images.

$img_src = $element->src;
if(!strstr($img_src, 'http://')) {
    $img_src = $v . $img_src;
}
echo $img_src . '<hr /';

There are some scripts out there that can do this work as well to convert relative URLs to absolute URLs:

I have never tried them, but they should help you to work past this.

Treffynnon
  • 21,365
  • 6
  • 65
  • 98
2

3 options:

  1. The image on the other site starts with http:// > use direct link
  2. Image starts with / > use home url of other site + image
  3. Image doesn't start with / > use full url + path to director of the site you are checking and add the image
Hugo Delsing
  • 13,803
  • 5
  • 45
  • 72
1

./ is current directory so if you are at http://example.com and you see an image with src attribute ./hoopy_frood.png it means the whole address is http://example.com/hoopy_frood.png

../ means one directory up, so for example at http://example.com/ice_cream/sundae.html if you see an image with src attribute ../images/hoopier_is_not_a_word.gif then the image hoopier_is_not_a_word.gif is in a directory called images which is inside the site root directory along with the directory called ice_cream.

Andbdrew
  • 11,788
  • 4
  • 33
  • 37