So I've been trying to use some regular expressions to extract information from <a href='#' >HTML a tag</a>
, for three separate schemas of possible tags.
<a id="Anchor_One" name="Anchor_One"> Anchor Details </a>
<a href="#Anchor_Two" name="Anchor_Two" > Anchor Two Details </a>
<a name="Anchor_Three" > Anchor Three Details </a>
So far I have some regular expressions to extract all the attributes from a given HTML tag /(\\w+)\s*=\\s*("[^"]*"|\'[^\']*\'|[^"\'\\s>]*)/
. And I also have some regex to match links with href
attribute active /<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU
. But I can't seem to create a pattern to match the other combinations of what a link tag may have.
<a id="Anchor_One" name="Anchor_One"> Anchor Details </a>
<a name="Anchor_Three" > Anchor Three Details </a>
Links that do not have href
attribute set, are not picked up with my current pattern, so not all the anchors can be retrieved.
$regexp = '/<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*)<\/a>/siU';
//parse the page with the provided regular expression
if(preg_match_all($regexp, $sessionBlock, $htmlMatches))
{
}