0

I'm trying to write a regex for replacing sequential occurrence of specified characters with single ones. I used a backreference and character class for that. However, some of the characters (. and ,, in my case) just get completely removed (and I cannot figure out what I've missed).

Question: Why does it work that way?

Note: I'm using C++20.

std::string s{ "...?12 :: 54  ! !! ..,,,- ---" };
const std::regex re("([.,\\-:!?]){2,}");
s = std::regex_replace(s, re, "$1");

I expected to get .?12 : 54 ! ! .,- -, but, instead, I get ?12 : 54 ! ! - -. Escaping . and , didn't help either.

ENIAC
  • 813
  • 1
  • 8
  • 19
  • Are you trying to match the `-` character, or the range of characters from `\ ` to `:`? – Barmar Jul 05 '23 at 16:42
  • 1
    `([.,\\-:!?]){2,}` matches 2 or more matches of the character set. So it matches all 4 characters `...?` at the beginning. It captures the last one, so the replacement is `?`. – Barmar Jul 05 '23 at 16:45
  • Are you trying to match a sequence of identical characters? – Barmar Jul 05 '23 at 16:45
  • Barmar, yes. Consequences of identical characters should be replaced with single ones. – ENIAC Jul 05 '23 at 16:49
  • 1
    Use a capture group with a back-reference: `([...])\\1+` to match the same character. – Barmar Jul 05 '23 at 16:52
  • I'm not writing an answer because I'm sure there are duplicates, but I haven't been able to find a good one for C++. – Barmar Jul 05 '23 at 16:53
  • Barmar, thank you so much! Your comment (I upwoted it) solves a problem. Please, do post it as the answer. – ENIAC Jul 05 '23 at 17:05
  • Interesting backreference magic. – ENIAC Jul 05 '23 at 17:07
  • it's basically what backreferences were created for. – Barmar Jul 05 '23 at 17:08
  • Barmar, I just cannot figure out why `\\1+` works, but `\\1{2,}` doesn't. – ENIAC Jul 05 '23 at 17:17
  • That will only match when the character is at least 3 times in a row. `([...])` matches the first one, and `\\1(2,)` matches at least 2 more. – Barmar Jul 05 '23 at 17:18
  • So it will match `...` but not `..` – Barmar Jul 05 '23 at 17:19
  • I'm going to suggest that even if it results in (marginally) more complex C++ code, a small loop to handle this will be a whole lot clearer than the regex involved in implementing it this way. – Jerry Coffin Jul 05 '23 at 17:30
  • @JerryCoffin, I agree with you, but I wanted to have a single line solution. – ENIAC Jul 05 '23 at 17:34
  • @Barmar, I think I got the idea. Thank you for your explanations! – ENIAC Jul 05 '23 at 17:35
  • @ENIAC: `remove_consecutive_dupes(some_string, ".,\-:!?");` ;-) Seriously though, "single line" seems like a terrible criterion to me. – Jerry Coffin Jul 05 '23 at 17:41
  • @JerryCoffin, I thought it was a C++ function, at first. LOL :) I'm the only contributor to my project, so that "terrible-look" solution is OK for me. – ENIAC Jul 05 '23 at 17:48

0 Answers0