1

i am using php and i am having problem to parse the href from anchor tag with text.

example: anchor tag having test http://www.test.com

like this <a href="http://www.test.com" title="test">http://www.test.com</a>

i want to match all text in anchor tag

thanks in advance.

Haim Evgi
  • 123,187
  • 45
  • 217
  • 223
Sanjay Khatri
  • 4,153
  • 7
  • 38
  • 42

3 Answers3

6

Use DOM:

$text = '<a href="http://www.test.com" title="test">http://www.test.com</a> something else hello world';
$dom = new DOMDocument();
$dom->loadHTML($text);

foreach ($dom->getElementsByTagName('a') as $a) {
    echo $a->textContent;
}

DOM is specifically designed to parse XML and HTML. It will be more robust than any regex solution you can come up with.

Daniel Egeberg
  • 8,359
  • 31
  • 44
  • Not that there's anything "wrong" with how you did it, why didn't you just use `DomElement::getElementsByTagName()` instead of the XPath query? It should be more efficient for that simple path... – ircmaxell Jul 29 '10 at 10:18
-1

Assuming you wish to select the link text of an anchor link with that href, then something like this should work...

$input = '<a href="http://www.test.com" title="test">http://www.test.com</a>';
$pattern = '#<a href="http://www\.test\.com"[^>]*>(.*?)</a>#';

if (preg_match($pattern, $input, $out)) {
    echo $out[1];
}

This is technically not perfect (in theory > can probably be used in one of the tags), but will work in 99% of cases. As several of the comments have mentioned though, you should be using a DOM.

Peter O'Callaghan
  • 6,181
  • 3
  • 26
  • 27
-1

If you have already obtained the anchor tag you can extract the href attribute via a regex easily enough:

<a [^>]*href="([^"])"[^>]*>

If you instead want to extract the contents of the tag and you know what you are doing, it isn't too hard to write a simple recursive descent parser, using cascading regexes, that will parse all but the most pathological cases. Unfortunately PHP isn't a good language to learn how to do this, so I wouldn't recommend using this project to learn how.

So if it is the contents you are after, not the attribute, then @katrielalex is right: don't parse HTML with regex. You will run into a world of hurt with nested formatting tags and other legal HTML that isn't compatible with regular expressions.

Recurse
  • 3,557
  • 1
  • 23
  • 36