I am working on web scraping application using simple_html_dom
. I need to extract all the images in a web page. The following are the possibilities:
<img>
tag images- if there is a css with the
<style>
tag in the same page. - if there is an image with the inline style with
<div>
or with some other tag.
I can scrape all the images by using the following code.
function download_images($html, $page_url , $local_url){
foreach($html->find('img') as $element) {
$img_url = $element->src;
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path;
download_file($img_url, $GLOBALS['website_local_root'].$img_path);
$element->src=$url_to_be_change;
}
$css_inline = $html->find("style");
$matches = array();
preg_match_all( "/url\((.*?)\)/", $css_inline, $matches, PREG_SET_ORDER );
foreach ( $matches as $match ) {
$img_url = trim( $match[1], "\"'" );
$img_url = rel2abs($img_url, $page_url);
$parts = parse_url($img_url);
$img_path= $parts['path'];
$url_to_be_change = $GLOBALS['website_server_root'].$img_path ;
download_file($img_url , $GLOBALS['website_local_root'].$img_path);
$html = str_replace($img_url , $url_to_be_change , $html );
}
return $html;
}
$html = download_images($html , $page_url , $dir); // working fine
$html = str_get_html ($html);
$html->save($dir. "/" . $ff);
Please note that, I am modifying the HTML too after image downloading.
downloading is working fine. but when i am trying to save the HTML, then its giving the following error:
PHP Fatal error: Cannot use object of type simple_html_dom as array
Important: its working perfectly fine, if I am not using str_replace
and second loop.
Fatal error: Cannot use object of type simple_html_dom as array in /var/www/html/app/framework/cache/includes/simple_html_dom.php on line 1167