3

I have a tiny regex: foo(\b)?. This was meant to be an experiment to see if I can deduce the existence of the boundary just by checking whether the first group was matched (and resulting in an empty string) or not.

I tried this with some languages: PHP/Python/Java/C#/RustInput manually. All of them behave as expected: An empty string for the first match and null/None/nothing for the second.
I can't figure out how to write a proper snippet in Go and C++, but regex101 says Go goes with those; I'm unsure about C++.

However, this is not the case with JS, as it outputs undefined for group 1 in both matches against foo food.

console.config({ maximize: true });

console.log(...'foo food'.matchAll(/foo(\b)?/g));
<script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>

Yet, (\b) without ? does capture an empty string.

console.config({ maximize: true });

console.log(...'foo food'.matchAll(/foo(\b)/g));
<script src="https://gh-canon.github.io/stack-snippet-console/console.min.js"></script>

Considering that ? is greedy, shouldn't (\b) always match and capture an empty string after the first foo, as with other languages?

I can reproduce this in both NodeJS and Chrome (V8) as well as Firefox (Gecko), so this is probably a quirk rather than a bug.

InSync
  • 4,851
  • 4
  • 8
  • 30
  • 1
    This does not depend on `\b`, you get the same behavior using `foo()?`, so whenever you ask to match an optional empty string. – logi-kal Jun 29 '23 at 10:06
  • 1
    Possibly related: unlike for all the other languages, in JavaScript `foo(d*?)?` matches the whole word `food` instead of just `foo`. – logi-kal Jun 29 '23 at 10:08
  • Interesting. It seems that `?` as well as `*` and `{0,n}` (quantifiers that allow 0 repetitions) either requires the group to have *some* content or fail altogether. For example, the second group in [`foo((?=(.)))?`](https://regex101.com/r/NVIjAb/1) was never matched. – InSync Jun 29 '23 at 10:25

0 Answers0