-2

I have the following regex http://regexr.com/3d1qb and it is one those that is greedy now I understand why this is happening. But how can I fix this?

Currently it is matching the whole thing as one. But the way I want it is to be matched twice not once.

Steve
  • 1,213
  • 5
  • 16
  • 29

1 Answers1

1

A regexp is not the right tool for parsing html. This one works for the examples, but it will not work with a real html:

(<a href="https:\/\/www.example.com\/finance-glossary.*?">)([^<]*)(<\/a>)

For example, in your test, the first <a.*?href can match anything until a href is found, in any element, attribute or text. It's just not something you should do with a regex.

http://regexr.com/3d1qh

Jérémie B
  • 10,611
  • 1
  • 26
  • 43