4

This must have an answer somewhere already, but I don't even know what to search for.

In JavaScript, I can reference a parenthesized submatch in a replacement string like so:

"abcdefghijklmnopqrstuvwxyz".replace(/(.)/g, "$1-");
// Result: "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z-"

Now, I want to put a 1 instead of a - in between each letter:

"abcdefghijklmnopqrstuvwxyz".replace(/(.)/g, "$11");
// Result: "a1b1c1d1e1f1g1h1i1j1k1l1m1n1o1p1q1r1s1t1u1v1w1x1y1z1"

At least Chromium seems to detect that there is no submatch group 11, so it interprets the replacement string as "submatch group 1 followed by a 1".

Let's assume there are 11 groups though:

'abcdefghijklmnopqrstuvwxyz'.replace(/^(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)$/, '$11');
// Result: "k"

What I would like to know:

  • Will example 2 work cross browser? Is it defined somewhere that it should behave like this?
  • Is there a way to explicitly delimit the submatch group reference? Something like in bash where $a and ${a} refer to the same variable, but the latter makes it possible to delimit it from the following text. Something that would enable me to make example 3 output "a1" rather than "k".
cdauth
  • 6,171
  • 3
  • 41
  • 49

1 Answers1

3

JavaScript regex engine assumes the longest digit sequence after $ is the group ID to refer to provided there is such a group in the pattern, so $111 will refer to Group 111 if there is one in the pattern, or to Group 11 and 1 if there is Group 11, or to Group 1 and 11 if there are fewer than 11 groups. It will be returned as a literal string if there are no groups at all (i.e. "x".replace(/./g, "$1") returns $1).

console.log('abcdefghijklmnopqrstuvwxyz'.replace(/^(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)$/, '$11')); // => "k"
console.log('abcdefghijklmnopqrstuvwxyz'.replace(/^(.)(.)(.)(.)(.)(.)(.)(.)(.).................$/, '$11')); // => "a1"

You may pad the group ID with zeros if you know there may be ambiguities in referring to groups in your pattern.

The $011 will be disambiguated as backreference to Group 1 and 1:

console.log('abcdefghijklmnopqrstuvwxyz'.replace(/^(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)$/, '$011')); // => "a1"

However, you may as well use named capturing groups with named backreferences:

console.log('abcdefghijklmnopqrstuvwxyz'.replace(/^(?<name>.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)$/, '$<name>1')); // => "a1"

In the Table 54: Replacement Text Symbol Substitutions table, the backreference syntax is defined as either $n or $nn, however, even $nnn syntax is also allowed.

Code units Unicode Characters Replacement text
0x0024, N Where 0x0031 ≤ N ≤ 0x0039 $n where n is one of 1 2 3 4 5 6 7 8 9 and $n is not followed by a decimal digit The nth element of captures, where n is a single digit in the range 1 to 9. If nm and the nth element of captures is undefined, use the empty String instead. If n > m, no replacement is done.
0x0024, N, N Where 0x0030 ≤ N ≤ 0x0039 $nn where n is one of 0 1 2 3 4 5 6 7 8 9 The nnth element of captures, where nn is a two-digit decimal number in the range 01 to 99. If nnm and the nnth element of captures is undefined, use the empty String instead. If nn is 00 or nn > m, no replacement is done.
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563