0

suppose I have this string:

some striinnngggg <a href="something/some_number">linkk</a> soooo <a href="someotherthing/not_number">asdfsadf</a>

I want to strip tags from this string that contains the tag format <a href="something/some_number"></a> without stripping the content of that tag where some_number can be any number

Hence in the example above, the desired end results is

some striinnngggg linkk soooo <a href="someotherthing/not_number">asdfsadf</a>

notice that the second tag did not get stripped since the second part of the link is not a number

how would I accomplish this using regex/php's preg functions

hakre
  • 193,403
  • 52
  • 435
  • 836
pillarOfLight
  • 8,592
  • 15
  • 60
  • 90
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#answer-1732454 – Axeman Mar 23 '12 at 23:00

2 Answers2

2

Detecting such tags with a regex is quite complicated since the order of the attributes can change, values can be delimited with double quotes, simple quotes, or none.

I think a easier way to do this is using DOMDocument to find matching tags:

$dom = new DOMDocument;
$dom->loadHTML($html);

$links = $dom->getElementsByTagName('a');

foreach ($links as $link) {
  if (preg_match("/[a-zA-Z0-9]+\/[0-9]+/", $link->getAttribute('href'))) {
    echo $link->nodeValue; // do whatever you need to do with the string here
  }
}
Tchoupi
  • 14,560
  • 5
  • 37
  • 71
1

Expression:

(<a.+?href=".*?\d.*?".*?>)(.+?)(</a>)

Find that, and replace with the second token (depending on your language it might be $2 or \1 or \2), which is just the link text.

qJake
  • 16,821
  • 17
  • 83
  • 135