I have programmatically downloaded the contents of a web page and hold it in a string variable. What is the best way to look for "og:image" meta tag content url?
E.g. assume a snippet from the view source of a page looks like below:
<meta property="og:site_name" content="The Christian Science Monitor" />
<meta property="og:type" content="article" />
<meta property="og:url" content="http://www.csmonitor.com/Business/2013/0729/Cannes-jewel-heist-53-million-in-diamonds-jewels-stolen-from-hotel" />
<meta property="og:description" content="Cannes jewel heist saw $53 million in diamonds and other precious gems stolen from a hotel on the French Riviera. The Cannes jewel heist is the latest in a series of several brazen jewelry thefts in Europe in recent years." />
<meta property="og:image" content="http://www.csmonitor.com/var/ezflow_site/storage/images/media/content/2013/0729-jewels/16474969-1-eng-US/0729-jewels.jpg" />
<meta property="og:title" content="Cannes jewel heist: $53 million in diamonds, jewels stolen from hotel" />
<meta name="sailthru.author" content="Thomas Adamson" />
I would like to extract "http://www.csmonitor.com/var/ezflow_site/storage/images/media/content/2013/0729-jewels/16474969-1-eng-US/0729-jewels.jpg" string that is the target of "og:image" tag.
I could construct some logic in code to look for substrings and then take it from there but I would like to accomplish this with regular expression syntax similar to this:
List<Uri> links = new List<Uri>();
string regexImgSrc = @"<img[^>]*?src\s*=\s*[""']?([^'"" >]+?)[ '""][^>]*?>";
MatchCollection matchesImgSrc = Regex.Matches(htmlSource, regexImgSrc, RegexOptions.IgnoreCase | RegexOptions.Singleline);
This last example scrapes a web page source and extracts all the image tags. I would like to do the same with og:image tags but I am not very well-versed with regular expressions.