1

DEMO

#include <iostream>
#include <regex>

int main() {
    std::string bstr = "111111111111111111111111111111111111111110";
    
    std::regex re(".{6}");
    bstr = std::regex_replace(bstr, re, "|$00");

    std::cout << "bstr: " << bstr << std::endl;

    return 0;
}

Why when I compile the same code using visual studio 2022 i get a different value stored on bstr after running the regex_replace function?

enter image description here

Cesar
  • 41
  • 2
  • 5
  • 16
  • 4
    It seems that [ECMAScript standard](https://262.ecma-international.org/5.1/#sec-15.5.4.11) doesn't define `$00` as the whole match, it only defines `$nn` for values bigger than `0`. Try with `"|$&"` to make it conformant and see if it fixes the problem (don't have Visual Studio on hand, so I can't check myself before posting an answer). – Yksisarvinen Sep 04 '22 at 23:35
  • 2
    @Yksisarvinen you should post that as an answer – Nick Sep 05 '22 at 02:36
  • 1
    @Yksisarvinen `"|$&"` output: `bstr: |111111|111111|111111|111111|111111|111111|111110` in Visual Studio. – Minxin Yu - MSFT Sep 05 '22 at 07:12
  • @MinxinYu-MSFT Thanks for confirmation! – Yksisarvinen Sep 05 '22 at 11:29

1 Answers1

2

Short answer: Use $& as a reference to whole match. In this case the correct format string is:

    bstr = std::regex_replace(bstr, re, "|$&");

Long answer: Well, this is a rare case where MSVC is right and gcc and clang are (technically) buggy.

C++ default regex flavour is based on ECMAScript standard. This standard defines $n and $nn as backreference to a capture group number n or nn, but neither of them allows 0 as valid group number. An invalid substitution should be left as is (i.e. treated like any other plain text).

This is what MSVC is doing. It recognizes that $00 substitution is invalid per standard and treats it as plain text. gcc and clang on the other hand made it work like std::regex_match and treat group number 0 as the whole match. A reasonable assumption, but technically incorrect by standard.

This can be confirmed also with regex101: |$00 is replaced with just that text, |$& is replaced with the match.

Yksisarvinen
  • 18,008
  • 2
  • 24
  • 52