2

I read the documentation of ?: in tcl regexp. Which says that it matches an expression without capturing it.

I tried and it worked fine. My query is, what is the proper use case for this option, as it we do not want to use capture sequence, we won't puts brackets there.

Is it just an alternate way, or have some special condition, where we should use this? Kindly clarify.

George G
  • 7,443
  • 12
  • 45
  • 59
Dcode
  • 223
  • 1
  • 14

3 Answers3

3

Easy: You need to group several elements in your Regex, but you don't need them as a capturing group for reference.

a+ (b+|c+) OR (a+ b+)|c+

I need braces for grouping. But if I run it like this the engine will capture all those matches. This may need a lot of memory and cost a lot of performance. If I don't need the capturing groups later for reference, I can use ?: to get grouping without the performance impact:

a+ (?:b+|c+) OR (?:a+ b+)|c+
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
Falco
  • 3,287
  • 23
  • 26
  • *This may need a lot of memory and cost a lot of performance.* That is actually not such a resource consuming operation. The point is that the stack for the group is not created and populated, but it not always that demanding an operation. The main point is that the non-capturing group is used *only* for grouping - and nothing else, thus, not ruining the existing order of capturing groups already defined in the pattern, or that you will define later. – Wiktor Stribiżew Jul 18 '16 at 18:23
  • I ran some Benchmarks in other questions where a single group changed to non capturing reduced runtime by 30% I think one could construct examples where it is a lot more – Falco Jul 18 '16 at 19:14
  • But many of these points are discussed in the great answer to the duplicate question, I think this one will soon be closed – Falco Jul 18 '16 at 19:16
  • I won't close this one, however, I'd expect a much more detailed answer tailored for Tcl. Non-capturing groups are really helpful. – Wiktor Stribiżew Jul 18 '16 at 20:00
  • @Mariano what about the point to write unambiguous code for the next programmer? I try to write my Regexes as clear and readable as possible. – Falco Jul 19 '16 at 06:48
2

First, have a look at the Tcl regex reference:

(expression)
Parentheses surrounding an expression specify a nested expression. The substring matching expression is captured and can be referred to via the back reference mechanism, and also captured into any corresponding match variable specified as an argument to the command. (?:expression)
matches expression without capturing it.

While the first part describing capturing group ability to capture subtext to be referred to with backreferences is universal, the second part dwelling on initializing variables based on the capturing group is specific to Tcl.

Bearing that in mind, Tcl regex usage can be greatly simplified with non-capturing groups in case you have a pattern with a number of capturing groups, and you want to modify it by adding another group in-between existing groups.

Say, you want to match strings like abc 1234 (comment) and use {(\w+)\s+(\d+)\s+\(([^()]+)\)}:

regexp {(\w+)\s+(\d+)\s+\(([^()]+)\)} $a - body num comment

However, you were asked to also match strings with any number of word+space+digits in-between 1234 and comment. If you write

set a1 "abc 1234 more 5678 text 890 here 678 (comment)"
regexp {(\w+)\s+(\d+)(\s+\w+\s+\d+)*\s+\(([^()]+)\)} $a - body1 num1 comment1
                     ^^^^^^^^^^^^^^^

the $comment will hold a value you would not expect.

Turning it into a non-capturing group fixes the issue.

See IDEONE demo

For other common uses of a non-capturing group, please refer to Are optional non-capturing groups redundant post.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

You can use () parentheses in regex when matching multiple word options which you then do not want to capture.

(?:one|two|three)
Finwe
  • 6,372
  • 2
  • 29
  • 44