Better way to parse string looking for specific tag

Question

I am looking at page source and basically want to get the "og:image" image url

I am using the following, which works, and I think (apart from relative URLs issue) covers all eventualities - but it might not be the most efficient way to do it - I have commented the code to show what each line is doing ($html is the source code):

$og_img = explode( '<meta property="og:image" content=', $html); // strip out beginning
$og_img = explode('>', $og_img[1]); // strip out end
if(substr($og_img[0], -1)=='/'){ $og_img[0] = substr($og_img[0], 0, -1); } // strip / if used /> to close the tag
$og_img[0] = str_replace("'", "", $og_img[0]); // strip ' ... ' apostrophes if used
$og_img[0] = str_replace('"', '', $og_img[0]); // strip " ... " doubke quotes if used

Is there a more efficient way?

if we could have an example of what $html looks like otherwise a preg_match('//',$html) could do the trik — , Aug 30 '12 at 09:22
@FoxMaSk html is simply page source - doesn't preg_match return whats in the expression though as opposed to what's between the markers? — StudioTime, Aug 30 '12 at 09:32
You'd probably want to add another parameter of $matches like so: `preg_match('//',$html,$matches)` and then use `print_r` on `$matches`. `preg_match` just returns true or false whether it matched or not; you need to put the matches into a variable. ***Edit***; I should add that this method would fail with ``, ``, `` etc. You're probably better off using an HTML parser such as http://simplehtmldom.sourceforge.net/ — h2ooooooo, Aug 30 '12 at 09:34
possible duplicate of [How to parse and process HTML with PHP?](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php). Regex is an alternative for coherent HTML, and if you know how to use it. While `explode` is most always the most silly and cumbersome workaround. — mario, Aug 30 '12 at 09:36
@mario Its not a duplicate, I'm asking if there is a more efficient way to do it - therefore trying to learn - adding words like silly without any explanation is just a waste of everyones time who reads it. — StudioTime, Aug 30 '12 at 09:55
Here's your downvote. Of course it's a duplicate. Extracting snippets from HTML comes up five times a day. You did not bother to search. And I'm not obliged to find the best duplicate ever to compensate your effort. Asking others to optimize your cumbersome code for efficiency doesn't make it an any more interesting question, quite the opposite. I will however not allude to who is wasting everyones time. — mario, Aug 30 '12 at 10:41

score 0 · Accepted Answer · answered Aug 30 '12 at 09:41

Don't roll it yourself.

Go use DOM. E.g.

$doc = new DOMDocument();
@$doc->loadHTML($html);
$metas = $doc->getElementsByTagName('meta');
for ($i = 0; $i < $metas->length; $i++)
{
    $meta = $metas->item($i);
    if($meta->getAttribute('property') == 'og:image')
        $og_image_content = $meta->getAttribute('content');
}

or (haven't tried it though) use:

get_meta_tags()

Better way to parse string looking for specific tag

1 Answers1