0

My regex query is the following (demo):

(?'a'[~_])(?=(?!\s)(?:(?!\k'a').)+(?<!\s)\k'a')|(?=(?=(?'b'[\s\S]*))(?'c'\k'a'(?!\s)(?:(?!\k'a').)+(?<!\s)(?=\k'b'\z)|(?<=(?=x^|(?&c))[\s\S])))\k'a'

The problem I'm facing is that backreferences to the named capture group (?'a'~_) fail to match in the part of the query on the right side of the main pipe:

(?=(?=(?'b'[\s\S]*))(?'c'\k'a'(?!\s)(?:(?!\k'a').)+(?<!\s)(?=\k'b'\z)|(?<=(?=x^|(?&c))[\s\S])))\k'a'

They do however work on the part to the left of the pipe:

(?'a'[~_])(?=(?!\s)(?:(?!\k'a').)+(?<!\s)\k'a')

The purpose of the query is to match only the surrounding delimiters of strings such as ~test~ or _test_, with a few additional criteria, which it does by first matching the opening delimiter with a lookahead (demo), and then using a variable length lookbehind to match the closing delimiter (demo with literals instead of backreferences).

While I am aware the query could be wildly simplified using \K or capture groups, neither are an option for me.

  • In regex ``(x)|\1`` right side of pipe starts execute only when left side fails. When right side fails capture1 is not set. So ``\1`` is never succeeds. – Michail Oct 21 '22 at 01:29
  • You could use this regex: ``(?'a'[~_])(?!\s)(?:(?!\k'a').)+(?<!\s)(?'b'\k'a')`` First delimiter is captured by capturegroup 'a' and second - by capturegroup 'b' – Michail Oct 21 '22 at 01:40
  • @Michail while that would indeed be a lot easier, I unfortunately cannot use capture groups outside the query itself (edited question to mention this); it's necessary for the query to match the delimiters and only the delimiters. – OunceOfShag Oct 21 '22 at 09:10
  • Clarify please: in text ``_a_b_`` you need to match only first and second ``_`` or all three ``_``? – Michail Oct 21 '22 at 09:46
  • @Michail Ideally only the first and the second – OunceOfShag Oct 22 '22 at 09:43

1 Answers1

1

Your regex is great. You can just correct it a little.

(?'a'[~_])(?=
   (?'d'(?!\s)(?:(?!\k'a').)+(?<!\s)\k'a') |
   (?=(?'b'.*))(?'c'
      ^(?>\k'a'(?&d)|.)*\k'a'(?&d)(?=\k'b'\z) |
      (?<=(?=x^|(?&c)).)
   )
)

Demo

But I think that the performance of such a regex will be low.

Michail
  • 843
  • 4
  • 11
  • This looks absolutely perfect, thanks. Performance isn't an issue since I don't plan on parsing over-long files, and this is probably already better than what I had before. – OunceOfShag Oct 23 '22 at 18:53