-1

I'm trying to find URLs in html. This is the example I'm trying to match:

href="http://(.+)"(?:.+)

<a href="http://www.etf.rs/" target="_top">

This matches: www.etf.rs/" target=

And it should: www.etf.rs**

It's not important if it matches some rubish thing, but it's iportant that all URLs are matched. Thanks!

midori
  • 4,807
  • 5
  • 34
  • 62
Dusan Milosevic
  • 460
  • 2
  • 4
  • 18

1 Answers1

1

You can use re.search:

import re

s = '<a href="http://www.etf.rs/" target="_top">'
print re.search('"http://(.*)"\s', s).group(1)

Output:

www.etf.rs/
midori
  • 4,807
  • 5
  • 34
  • 62
  • leave a comment if you don't like the answer, otherwise it's useless – midori Jan 26 '16 at 02:17
  • you are welcome, my comment was for someone who downvoted without comment, it's hard to understand what's not right in the answer without a comment – midori Jan 26 '16 at 02:48