Possible Duplicate:
Best methods to parse HTML
Hi, I want to have regex that match only anchor tag which contains text. And in group i want link without domain Eg. I want to match something like that:
<a href="https://stackoverflow.com/questions/ask/users/login" id="login-link">log in</a>
group1:questions/ask/users/login
but it can't match :
<a href="https://stackoverflow.com/questions/ask/users/login" id="login-link"><img src="https://stackoverflow.com/images/login.png" alt="log in" title="login" /></a>
I have created something like that:
<a.*?href=["']http:\/\/.*?\/(.*?)["'].*?>(.*?)</a>
And it works quite good but it match all anchor tags.
` or `` from the HTML document. That's just variable input. Any question that shows how to extract content with a parser like DOM will solve your problem. And there is hundreds of those. I know that because [I have answered what feels like half of them](http://stackoverflow.com/search?q=user%3A208809+DOM).
– Gordon Jan 14 '11 at 11:20