-1

What I am simply trying to do is take the following code:

$Anchors = '<a href="#" class="test1"><div class="test2"><a href="#" class="test3"><div class="test4">'

And get the value of the class attribute of the last anchor tag, in this case "test3." So far I have this:

if(preg_match('/(<a\s.*)(class="|\')([^-\'"]*)("|\')?.*?([^>])/i',$Anchors,$matches)){

But obviously its not doing what I want it to do, any help?

SReca
  • 643
  • 3
  • 13
  • 37
  • 4
    Obligatory "don't use regex to parse HTML" comment. – nickb Jul 12 '13 at 17:14
  • You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. ... O̚​N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜... http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags?page=1&tab=active#tab-top – user20232359723568423357842364 Jul 12 '13 at 17:16

2 Answers2

2

Description

This regular expression will:

  • match the last anchor tag from your string
  • capture the value for the class attribute
  • avoid many of the potential problems with using regex to when searching html strings

 

.*<a\b(?=\s) # capture the open tag
(?=(?:[^>=]|='[^']*'|="[^"]*"|=[^'"][^\s>]*)*?\sclass=['"]([^"]*)['"]?)  # capture the src attribute value
(?:[^>=]|='[^']*'|="[^"]*"|=[^'"\s]*)*"\s?\/?> # get the entire  tag

enter image description here

Example

Live example here: http://www.rubular.com/r/G5F6AD5UyL

Sample text

note that the last tag has a difficult edge case

<a href="#" class="test1"><div class="test2">
<a onmouseover=' class="NotTheClass" ; funClassRotator(class) ; ' class="test3" href="#" ><div class="test4">

Capture Groups

[0][0] = <a href="#" class="test1"><div class="test2"><a href="#" onmouseover=' class="NotTheClass" ; funClassRotator(class) ; ' class="test3">
[0][1] = test3
Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43
1

It will be faster using ganon or simplehtmldom

For example using simplehtmldom

foreach($html->find('a') as $element)
   echo $element->class . '<br>';