0

I am trying to match chains of characters that are within single opening and closing parenthesis.

In the following examples only the first two lines should return ab.
The other ones shouldn't match anything

ab(ab)ac => ab
(ab)ndn => ab
ab(ab(ac)an) => void
ab((ab)ab)ab => void
ab(ab(abb))ab => void
ab(ab(ab(ab))ab) => void

This is as far as I could go atm I do not know why the third line is still matched. https://regex101.com/r/weGhVz/2

tbop
  • 394
  • 1
  • 3
  • 13

1 Answers1

1

You could use regex negative lookahead feature to make sure there is no parentesis after the first closing.

\((\w*)\)(?!\w*\))

This produces the output you want. Group one gives the sequence inside the parenthesis

Ashkan
  • 1,050
  • 15
  • 32
  • ah thanks I was experimenting with those meanwhile but still couldn't get it right. Lemme update with my original link as a comment. Thank you very much! https://regex101.com/r/weGhVz/3 – tbop Jan 30 '19 at 14:22
  • Hum note that strings like "ab(ab)ab)" will not be matched tho. https://regex101.com/r/weGhVz/4 – tbop Jan 30 '19 at 14:26
  • but strings like "(ab(ab)abba" will. I have to think whether I can fix that. – tbop Jan 30 '19 at 14:27
  • Interestingly "ab(ab))" will work hum. https://regex101.com/r/weGhVz/5/ – tbop Jan 30 '19 at 14:29
  • I was going to add that I assume you have matching parenthesis. C++ regex as far as I know doesn't support negative lookbehind. – Ashkan Jan 30 '19 at 14:30
  • I'll try to come up with a better generalization of this – Ashkan Jan 30 '19 at 14:31
  • 1
    You are using + instead of * in the negative lookbehind. That is why you match (ab)) – Ashkan Jan 30 '19 at 14:34
  • Ach no way? Indeed it doesn't support it. Deplorable. I've also found other things that are said to be wrong according to the C++ regex engine but works with other engine, not only in terms of functionalities but also in terms of interpretation. For instance in the regex "ab{blabla}ab", the {} will be interpreted as a range expression whereas other engines understand this must be escaped. It makes sense to rant but it seems other engines are more lax on this case. – tbop Jan 30 '19 at 14:34
  • The context behind would be to ideally avoid parsing non-matching parenthesis nested patterns. – tbop Jan 30 '19 at 14:36
  • Exactly a quick dirty solution is to just avoid strings with non-matching parenthesis. You could even solve the whole problem that way. Count parenthesis, +1 for opening -1 for closing – Ashkan Jan 30 '19 at 14:39
  • Hum not sure this would work. The string "ab)ab(ab" would be incorrectly matching right? – tbop Jan 30 '19 at 14:54
  • Which one? counting or the regex? For counting you should check to never go into negative values – Ashkan Jan 30 '19 at 15:16
  • Just a small update, apparently boost regex has lookbehind. It is based on Perl – Ashkan Jan 31 '19 at 08:19