1

so I got my hands on regular expressions and tried to match the outer {% tag xyz %}{% endtag %} tags of the following text using regular expressions:

{% tag xyz %}
   {% tag abc %}
   {% endtag %}
{% endtag %}

My regular expression looks as follows and works so far:

({%)\s*(tag)([^%}]*?)(?:\s*(?:(%})((?:(?:[^{%]*?)|(?R))*)(?:({%)\s*(end\2)\s*(%}))))

But whenever the text inside of the matching tags contains a single { or % sign, the regex won't work as expected. I think it's because of the character classes that may match {% but also { or % as single characters too. I tried a lot and ended up with try and error but without success.

Any help on that issue?

I setup two regex101 links for you to show the issue:

Any help is really appreciated!

techworker
  • 13
  • 2
  • This is not something that standard regular expressions handle (in the general case). If you have PCRE (Perl-compatible Regular Expressions), and they're compatible enough, you may be able to use the Perl features that support nested expressions, but those 'regular expressions' are not really 'regular' any more. You should give a little more context on where you are planning to use this code. – Jonathan Leffler Jan 17 '15 at 01:37

1 Answers1

0

Try to to replace [^{%] with (?:(?!{%).) and add the s (PCRE_DOTALL) flag:

This would allow { that are not followed by % in between by using a negative lookahead.

Test your updated pattern or here another start to try with:

/{% tag \w+ %}(?:(?:(?!{%).)|(?0))*{% endtag %}/gs

test at regex101

Jonny 5
  • 12,171
  • 2
  • 25
  • 42
  • Thank you Jonny. I tried the lookahead approach too, but I used it wrong. It seems like the lookahead always needs something to _look ahead_ to ;-) I forgot the `.` and just tried `(?:(?!{%))` instead of `(?:(?!{%).)`. – techworker Jan 17 '15 at 13:17
  • @techworker The reason is, that you want to consume characters / move forward. `(?:(?!{%))*` would quantify just a position (zero-width), which stalls. Such as `()+` stays on the same position [see example](https://regex101.com/r/lB7rT4/1). Lookarounds don't move in the string, they look in a direction at at assigned position. Glad it works now :) – Jonny 5 Jan 17 '15 at 13:51