1

I just asked a similar question to this one, and there was an excellent and accurate answer, but it turns out I now have a brand new problem. It turns out I have a single line of relevant input. I'm not sure how to ask this in an abstract way so I'll just jump right into my input:

(EDITED to provide a better example)

bear999bear888bear777bear666fox---bear222bear333bear444bear555fox

(The items between the markers are not necessarily numeric)

This is the expression (EDITED to match updated input example):

bear.*bear(?<matchString>(.(?!bear.*bear))*?)bear.*fox

It's returning 444. Is there a way that I can tweak this to return both 444 and 777? It seems to be skipping over the first match and favoring only the latter. I have the ! exclusion so that it matches only the innermost on the left side.

I've been testing here: http://regexlib.com/RETester.aspx

This works great when I break it into two lines and turn on multi-line. Why does it stop working when the input is on a single line?

Any advice would be appreciated!

Community
  • 1
  • 1
Josh
  • 805
  • 10
  • 22
  • Can we say that `xxx[^xy]+yyy` is the closing delimiter here? See, in this particular example the `/([^x]+?)(?=xxx[^xy]+yyy)/g` pattern is quite enough... – raina77ow Oct 16 '12 at 23:04
  • @raina77ow - Thanks. I'm not sure I completely follow, but the delimiter is actually a group, rather than the specific character (in another example, it could be "ABC"). I'm a bit of a regex amateur, so I'm not quite sure how to translate that information onto your idea. – Josh Oct 16 '12 at 23:29
  • It doesn't matter how experienced (or not) with regex you're. ) The key point is ALWAYS defining two sets of data: one that should match (or be captured), and the other that shouldn't. Said this, can you show the string that won't be processed properly by the regex I've shown? – raina77ow Oct 16 '12 at 23:31
  • @raina77ow - For example, xxx999xxx888xxx777x000xxx666yyy---xxx222xxx333xxx444xxx555yyy, should return 777x000 and 444. Thanks for your help on this. – Josh Oct 16 '12 at 23:39
  • @raina77ow - I just updated the question, using "bear" and "fox" instead. I introduced some confusion by using the same repeated character in my example (I was trying to keep it readable but made it worse). – Josh Oct 17 '12 at 00:19

2 Answers2

1

This should work (it does work in that regex tester you've linked in the question):

(?<=bear)(?:(?!bear).)*(?=bear(?:(?!bear).)*fox)

It reads like "let's match something that is preceded by bear, has no bear sequence within, and is followed by the bear - no bear - fox sequence".

The capturing groups are absent here; the whole match is what you need.

And yes, I just can't help wondering why should this be done with a single regex when it actually looks like a work for a tokenizer. ) For example, you can split your line by 'fox' first, then split each part by 'bear' - and take the one before the last one of each result.

raina77ow
  • 103,633
  • 15
  • 192
  • 229
0

Your first .* is greedy. This will work:

xxx.*?xxx.*?xxx(?<matchString>.*?)xxx.*?yyy
Eric
  • 95,302
  • 53
  • 242
  • 374
  • Thanks. Unfortunately I need this to pick the innermost match, though. I don't think your example allows for a scenario such as xxx1xxx2xxx3xxx4xxx5xxx6yyy, in which case it will return "3" instead of the desired "5". – Josh Oct 16 '12 at 23:25
  • In what way is 5 the "innermost match"? – Eric Oct 17 '12 at 06:22
  • xxx.*?xxx.*?xxx will match on xxx1xxx2xxx, which is too far to the left, and result in a match of "3xxx4xxx5". I'm looking for the innermost match of the expression, which would match on xxx3xxx4xxx, which would result in "5". – Josh Oct 17 '12 at 15:36