0

I can't completely wrap my head around combining positive lookaheads and backtracking in JavaScript. I read this answer, as well as this article and did some testing with https://regex101.com/, but don't quite get the result I'm expecting.

So given the test string banana12, the following works:

  • (?=\d{2,}) - assert that the string contains at least two digits
  • (?=[a-z]{5,}) - assert that the string contains at least five lowercase characters

These two conditions work for both banana12 and 12banana.

However, when I try to combine the two (?=\d{2,})(?=[a-z]{5,}), I don't get a match. In order to get a match when using banana12, I need to add \D* to the digit-testing lookahead: (?=\D*\d{2,})(?=[a-z]{5,}).

Why doesn't (?=\d{2,})(?=[a-z]{5,}) work?

If I change the test string to 12banan, to get a match I need to use: (?=\d{2,})(?=[^a-z]*[a-z]{5,}) - so this time no backtracking for the digits, backtracking for the lowercase letters.

So in general, if I want to make sure, that my regex matches correctly both strings (12banana, banana12), I need to use: (?=\D*\d{2,})(?=[^a-z]*[a-z]{5,}).

Why? If both lookaheads work on their own, why don't they work in combination and why is it necessary to add backtracking?

Alexander Popov
  • 23,073
  • 19
  • 91
  • 130
  • 2
    `(?=\d{2,})(?=[a-z]{5,})` tries to match the POSITION which is IMMEDIATELY followed by 2 or more digits AND 5 or more letters. That is never possible – Gurmanjot Singh Feb 12 '19 at 11:55
  • 1
    `(?=\d{2,})(?=[a-z]{5,})` require the text immediately to the right to match both patterns at the same time – Wiktor Stribiżew Feb 12 '19 at 11:56
  • 1
    Actually, it is not that complicated. Lookarounds ensure a position and when you add multiple lookarounds the assertion must be true for all of them for this specific position. You can see it on https://regex101.com/r/QCmxts/1 and https://regex101.com/r/QCmxts/2 respectively - the positions differ. – Jan Feb 12 '19 at 11:56
  • https://stackoverflow.com/a/2973609/3832970 does explain it to you. *Lookarounds are zero width assertions. They check for a regex (towards right or left of the current position - based on ahead or behind), succeeds or fails when a match is found (based on if it is positive or negative) and discards the matched portion. **They don't consume any character - the matching for regex following them (if any), will start at the same cursor position**.* – Wiktor Stribiżew Feb 12 '19 at 12:03
  • Nobody said the real reason: They do NOT consume characters. – ibrahim tanyalcin Feb 12 '19 at 12:03
  • WiktorStribiżew, ibrahim-tanyalcin -> I already knew that lookaheads don't consume characters and https://regex101.com/ visualizes this in a great way. However, this didn't help me to understand why the combination doesn't work - my logic was - since they don't consume characters, then the two should work in combination, because each one starts at the beginning. What I wasn't realizing was that there must be a position, which satisfies both conditions. In that sense, no, @WiktorStribiżew, the question/answer you pointed to, didn't help me (but your comment did). – Alexander Popov Feb 12 '19 at 12:09
  • I added [another link](https://stackoverflow.com/a/31347783/3832970), it explains that in depth. – Wiktor Stribiżew Feb 12 '19 at 12:12

0 Answers0