-1

So I have the following sample XML on a single line:

<foo>123</foo>   <foo>456</foo>  <bar>abc</bar> <foo>789</foo>   <foo>0AB</foo>   <bar>def</bar>

I'm looking for a regex which matches the first pair of <foo> tags, and which stops at the first <bar>

I'm trying solutions around:

/<foo>.\+<\/foo>.\+<bar

But this matches the entire thing. How do I get it to stop at the first <bar> ?

Stewart
  • 17,616
  • 8
  • 52
  • 80

1 Answers1

1

This happens because by default, regular expressions are greedy; that is, they match as much data as possible. However, in this case, what you want is a non-greedy regex so you match only the first part.

<foo>.\{-}<\/foo>.\{-}<bar

The pattern \{-} is equivalent to *, but is non-greedy, like Perl's *?. See :help non-greedy for more details.

As a side note, you cannot parse HTML or XML in the general case with regular expressions (since regexes are not powerful enough), but in this case I assume that you have a limited subset of data where this is good enough.

bk2204
  • 64,793
  • 6
  • 84
  • 100