7

I have plenty of confusion in regular expression and I am trying to solve them. Here I have the following string:

{start}do or die{end}extended string

My two different regexes, where I only changed the position of the dot:

(.(?!{end}))* //returns: {start}do or di
                                      //^ See here
((?!{end}).)* //returns: {start}do or die
                                      //^ See here

Why does the first regex eats the last "e" ?

And also how does this negative lookahead make this * quantifier non greedy? I mean why it can't consume characters beyond {end}?

AL-zami
  • 8,902
  • 15
  • 71
  • 130

2 Answers2

2

With your negative lookahead you say, that it is impossible to match the regex, which in your case is: {end}. And . captures everything except new line.

So with your first regex:

(.(?!{end}))*

It leaves out the e, because: e{end} can't match because of the negative lookahead. While in your second regex, where you have the dot on the other side it can until: {end}d so the e is included in your second regex.

Rizier123
  • 58,877
  • 16
  • 101
  • 156
  • thanks ! but can you be a little more specific? why e{end} will not match because of negative lookahead.Shouldn't it be only {end}? – AL-zami Jul 17 '15 at 19:05
  • @AL-zami `.` is greedy by it's own, so it will try to match as much as possible. So: `{start}do or die` won't work, because of the legative lookahead, but it tires to match as much as possible and that's: `{start}do or di` just without the `e`, because it doesn't match the lookahead. In your second case it's the same, expect you swapped some things, so again it tries to match as much as possible: `{start}do or die{end}e` and here again it doesn't work because of the lookahead. So it will end up with: `{start}do or die` where the lookahead doesn't match. (The assertions are never in the match) – Rizier123 Jul 17 '15 at 19:11
  • @AL-zami So where are we with this question now? – Rizier123 Jul 17 '15 at 19:57
  • still trying to wrap my head around this. Actually it would be better if i could figure out what is happening inside regex engine during the match..i am getting a hard time figuring this out ! – AL-zami Jul 17 '15 at 20:19
  • @AL-zami The negative lookahead says, that it is **impossible** to match the regex, which is in your case: `(?!` **{end}** `)` . And the `.` matches everything except new line. So in your first regex: `[Everything except new line][Everything but not: {end}]` this regex can't match: `[...]e{end}[...]` Because `{end}` is there. So it end's with `[...]di[... (not {end} matched here)]` because it can match this. In your second regex: `[...]{end}e[...]` can be matched, because there is no `{end}` in front of the `e` of `die`. But it ends with: `{end}e` It can't match this. – Rizier123 Jul 17 '15 at 20:26
  • i did some reading and tried to understand your answer.I have made a workflow for the regex engine for both of the regex.I have added it below as an answer as it is quite large for a comment section.Would you please look at that and check whether it is correct or not :) – AL-zami Jul 19 '15 at 20:26
  • @AL-zami Since you now understand what's going on, I hope you also understand my answer better and what I meant. – Rizier123 Jul 19 '15 at 20:38
1

i have figured a work flow for the regex engine for both the regex on completing the task...

First, for (.(?!{end}))* the approach for the regex engine as follows...

"{start}do or die{end}extended string"
^   .(dot) matches "{" and {end} tries to match here but fails.So "{" included
"{start}do or die{end}extended string"
 ^  . (dot) matches "s" and {end} tries to match here but fails.So "s" included

....
....so on...
"{start}do or die{end}extended string"
               ^ (dot) matches "e" and {end} here matches "{end}" so "e" is excluded..
so the match we get is "{start}do or di"

for the secodn regex ((?!{end}).)*....

"{start}do or die{end}extended string"
^ {end} regex tries to match here but fails to match.So dot consumes "{".

"{start}do or die{end}extended string"
 ^ {end} regex tries to match here but fails again.So dot consumes "s".

....
..so on..
"{start}do or die{end}extended string"
               ^   {end} regex tries to match here but fails.So dot consumes the "e"
"{start}do or die{end}extended string"
                ^   {end} regex tries to match here and succeed.So the whole regex fail here.

So we ended up with a match which is "{start}do or die"
AL-zami
  • 8,902
  • 15
  • 71
  • 130