1

I am new to regular expressions and tcl and am facing very basic issue from a long time now.

I am given with the task to find all the characters in given word, whose immediate next character is not identical to this character. I have written following tcl snippet to achieve this:

set str "goooo";
set lst [regexp -all -inline {(\w)[^\1]} $str];
puts $lst

I am getting following error:

couldn't compile regular expression pattern: invalid escape \ sequence
    while executing
"regexp -all -inline {(\w)[^ \1]} $str"

Is there any other way to use backreferencing in tcl?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Sparsh
  • 11
  • 3

1 Answers1

1

Backreferences cannot be used inside bracket expressions in any regex flavor. [^\1] matches any char but a \x01 char. This happens so because bracket expressions are meant to use exact literal characters or ranges of them.

In your case, you can remove all chunks of repeated chars with (\w)\1+ (while replacing with the same single char using the \1 backreference in the replacement pattern) and then extract the word chars:

set lst [regexp -all -inline {\w} [regsub -all {(\w)\1+} $str {\1}]];

See the online demo:

set str "sddgoooo";
set lst [regexp -all -inline {\w} [regsub -all {(\w)\1+} $str {\1}]];
puts $lst

Output:

s d g o

Note that in other regex flavors, you could use a regex with a negative lookahead: (\w)(?!\1) (see this regex demo). The (?!\1) negative lookahead matches a location that is not immediately followed with Group 1 value. Unfortunately, Tcl regex flavor - although Tcl AREs generally support lookaheads - does not support lookaheads with backreference inside them.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • It does support lookaheads, but doesn't support backreferences within them (because it uses a different DFA to match lookaheads, if I've understood the *extremely complicated* code in the RE engine correctly). – Donal Fellows Jul 21 '22 at 13:33
  • @DonalFellows I updated the end of the answer to reflect this. – Wiktor Stribiżew Jul 21 '22 at 13:41