-1

I'm trying to use php to get meta data like og:image, title or description.

I'm using that code:

<?php
$sites_html = file_get_contents($url);

$html = new DOMDocument();
@$html->loadHTML($sites_html);
$meta_og_img = null;
//Get all meta tags and loop through them.
foreach($html->getElementsByTagName('meta') as $meta) {
    //If the property attribute of the meta tag is og:image
    if($meta->getAttribute('property')=='og:image'){
        //Assign the value from content attribute to $meta_og_img
        $meta_og_img = $meta->getAttribute('content');
    }
}
echo $meta_og_img;
?>

When I use this url (https://www.elmundo.es/papel/2019/01/28/5c4ed8effc6c83d2718b4605.html) it works perfectly but when I use this one (https://andresmartin.org/2016/09/mindfulness-la-fibromialgia-mirar-dolor-amabilidad-alivia-malestar-reduce-dolor/), I get the error.

How can I avoid this error? And if it is impossible to do, how can I get the meta data with another method?

I think it is not important but I'm using laravel.

EDIT: Here is a screenshot of the error https://pasteboard.co/HYPI7KV.png

Juan Lopez
  • 361
  • 1
  • 3
  • 16
  • Those sites might simply reject requests made with a User-Agent header that does not make it look like the request came from a “regular” browser … so I’d try faking one of those first. (Of course it could be a multitude of other reasons as well. Sites using popular DDoS protections services such as Cloudfront might be a tougher challenge.) – 04FS Jan 30 '19 at 11:29
  • Your code is working fine at my end. Can you please share the screenshot? – Swaroop Deval Jan 30 '19 at 11:31
  • @swaroopDeval Screenshot added – Juan Lopez Jan 30 '19 at 11:37
  • why there is @ in front of $html? – Swaroop Deval Jan 30 '19 at 11:42
  • as @04FS said, this website is not accepting this kind of requests. That is why it gave "403 Forbidden" (means you dont have permission to access the file). – Swaroop Deval Jan 30 '19 at 11:44
  • But if I paste the link in a facebook post, the image and other info appears. There must be a way to get this data. – Juan Lopez Jan 30 '19 at 11:46
  • Did you try what I suggested? – 04FS Jan 30 '19 at 11:48
  • @04FS Sorry but I dont know how to do what you sugested. Anyway I'm using chrome. – Juan Lopez Jan 30 '19 at 11:50
  • Try setting a User-Agent for PHP to use with such requests. http://php.net/manual/en/filesystem.configuration.php#ini.user-agent – 04FS Jan 30 '19 at 11:58

1 Answers1

0

Finally I found the way.

I added:

$context = stream_context_create(
    array(
        "http" => array(
        "header" => "User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36"
        )
    )
);
$sites_html = file_get_contents('https://andresmartin.org/2016/09/mindfulness-la-fibromialgia-mirar-dolor-amabilidad-alivia-malestar-reduce-dolor/', false, $context);

Now it works fine.

Juan Lopez
  • 361
  • 1
  • 3
  • 16