0

How to parse backslash character in tcl?

I've got pattern with value "\Q[9]_i_1_n_0" and I want to find line $line containing this pattern? How could I do that?

(puts $pattern returns: {\Q[9]_i_1_n_0} but I use foreach j [split $pattern] loop, so $j is purely \Q[9]_i_1_n_0)

regexp $pattern $pattern 

does not work:

Error: couldn't compile regular expression pattern: invalid escape \ sequence

lsearch $pattern $pattern returns -1

string match $pattern $pattern returns 0.

regexp {$pattern} $pattern return 0

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215
user2921643
  • 85
  • 3
  • 11

2 Answers2

0
set pattern {\Q[9]_i_1_n_0}

string first $pattern $pattern
# => 0

Matching with string first compares the text content of both strings without assigning any special meaning to characters. A result of 0 means that a match was found in position 0 (if there is no match, you get -1). string first won't tell you if you've found an exact match: for that you need to ascertain that the result is 0 and the length of the strings is the same.

Matching by "glob-style" / "string match" or by regular expression needs to consider characters that are special to those matching languages. For example, \, *, ?, [, ] are special in glob-style matching, and \, ., *, +, ?, {, }, (, ), ^, $ are special in regular expression matching. "Special" in this context means that e.g. \ does not mean "backslash" but (in both cases) "escape", i.e. a character that takes away the "specialness" of another character. This means that for instance \\ does mean backslash, and \* does mean asterisk.

Since the pattern you are using contains both \, [, and ], they need to be escaped before the pattern can be used for glob-style or regex matching. (Actually, by a syntactic quirk, a ] that closes an escaped [ doesn't need to be escaped.)

One of the easiest ways to escape these characters is by using a string translation operation performed by the string map command. One would think that this would do the trick:

string map {\ \\ [ \[} $pattern ;# error! this code won't work!

but that won't work since backslashes are still special in the string map command. We need to exactly double the number of backslashes in the map:

string map {\\ \\\\ [ \\[} $pattern

and now we can try to use glob-style / regex matching:

string match [string map {\\ \\\\ [ \\[} $pattern] $pattern
# => 1
regexp [string map {\\ \\\\ [ \\[} $pattern] $pattern
# => 1

The result of 1 means boolean truth: a match was found. Note that the results will differ if there is a prefix and/or suffix:

string match [string map {\\ \\\\ [ \\[} $pattern] abc${pattern}def
# => 0
regexp [string map {\\ \\\\ [ \\[} $pattern] abc${pattern}def
# => 1

This is because the string match is implicitly anchored at the ends of the pattern, while the regex needs to be explicitly anchored or it will ignore preceding or succeding text.

Matching in a list is similar. lsearch -exact works like string first except that it will only accept exactly equal strings. lsearch -regexp and lsearch -glob work like regex and glob-style matching, respectively.

set list [concat abc $pattern def]
# => abc \Q[9]_i_1_n_0 def
lsearch -exact $list [join $pattern]
# => 1
lsearch -regexp $list [string map {\\ \\\\ [ \\[} [join $pattern]]
# => 1
lsearch -glob $list [string map {\\ \\\\ [ \\[} [join $pattern]]
# => 1

The result of 1 here means that the second element in the list (index 1) matched the pattern.

(The use of concat and join is a bit of low-level trickery to avoid having the braces in the string representation get in the way.)

Documentation: concat, join, lsearch, Syntax of Tcl regular expressions, regexp, string

Peter Lewerin
  • 13,140
  • 1
  • 24
  • 27
0

You've got a string with several characters in it that are metacharacters for both regexp and string match. In particular, both interpret backslashes and brackets to mean things by default. This means that lsearch won't find it (or lsearch -glob), that lsearch -regexp won't work (invalid RE), and lsearch -exact would only find it if it was the whole of the string (no points for a partial match with that lsearch option).

But you can override the behaviour of regexp-style matching by putting ***= at the front of the pattern, provided you are looking for a literal:

set sampleText {this is a sample \Q[9]_i_1_n_0 with the pattern in it}
set pattern {\Q[9]_i_1_n_0}
puts [regexp ***=$pattern $sampleText]
# Prints 1... it matched!

Let's get some better matching information:

puts [regexp -inline -indices ***=$pattern $sampleText]
# {17 29}

Looks like it's right to me. This will also work with lsearch -regexp; the ***= trick is a feature of the RE engine core (which is shared).

Donal Fellows
  • 133,037
  • 18
  • 149
  • 215