0

I'm trying to use String.sub! in ruby and it substitutes way too much.

The regex i'm using. You can see it's matching too much: http://rubular.com/r/IUav4KEFWH

<rb>.+<\/rb>

it selects from the first to the last and I want it just to select the first pair. is there another version of sub I'm not aware of, or a better way to sub

it would be easy to turn of multi-line and put them on separate lines but I don't want to sacrifice multi-lining

hwnd
  • 69,796
  • 4
  • 95
  • 132
maek
  • 1
  • 2
  • 2
    Please put your code in your question. It's OK to have the link as well, but questions should be able to survive broken links. – Cary Swoveland Sep 14 '14 at 08:08

3 Answers3

1

Your regex is too greedy:

<rb>.+<\/rb>

Make it non-greedy using:

<rb>.+?<\/rb>

Rubular Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
1

It matches from the first <rb> tag up until the very last </rb> tag because + is a greedy operator meaning it will match as much as it can and still allow the remainder of the regular expression to match.

You want to use +? for a non-greedy match meaning "one or more — preferably as few as possible".

<rb>.+?</rb>

Note: A parser to extract from HTML is recommended rather than using regular expression.

hwnd
  • 69,796
  • 4
  • 95
  • 132
0

You can try this variant:

<rb>(?>(?!<\/rb>).)*+<\/rb>

Demo

Or if you want:

<rb>[^<]+<\/rb>

Demo

See the difference between .*? And [^<]+ in this DEMO

walid toumi
  • 2,172
  • 1
  • 13
  • 10