2

I am writing regular expression for removing newlines after every tag closing like "%}\n" for template strings. But it shouldn't remove when the content is in between the {% verbatim %}content{% endverbatim %} tag and also shouldn't remove newline after {% endverbatim %} if any present.

I tried this :

import re
def my_function(template_string):

    replacement_string = template_string
    found = re.search("{%\s*verbatim\s*%}(\s*.*\s*){%\s*endverbatim\s*%}", template_string, re.DOTALL)
    replacement_string = re.sub("%}\n","%}", replacement_string, re.DOTALL)
    replacement = "{%% verbatim %%}%s{%% endverbatim %%}" % found.group(1)
    pattern = re.compile("{%\s*verbatim\s*%}(\s*.*\s*){%\s*endverbatim\s*%}", re.DOTALL)
    result_string = pattern.sub(replacement, replacement_string)
    return result_string

I used this string to test this regex:

"This is test string\n {% set var=2 %}\n {% verbatim %}\n Inside verbatim 1 {% set var2=4%}\n {% endverbatim %} {% set value=10%}\n {% verbatim%} Inside verbatim 2 {% set new_val=13%}\n {% endverbatim %}\n ..."

template_string = "This is test string\n {% set var=2 %}\n  {% verbatim %}\n Inside verbatim 1 {% set var2=4%}\n {% endverbatim %} {% set value=10%}\n {% verbatim%} Inside verbatim 2 {% set new_val=13%}\n {% endverbatim %}\n    ..."
my_function(template_string)

output of function above:

'This is test string\n {% set var=2 %} {% verbatim %}\n Inside verbatim 1 {% set var2=4%}\n {% endverbatim %} {% set value=10%}\n {% verbatim%} Inside verbatim 2 {% set new_val=13%}\n {% endverbatim %}
...'

Result I want:

'This is test string\n {% set var=2 %} {% verbatim %}\n Inside verbatim 1 {% set var2=4%}\n {% endverbatim %} {% set value=10%} {% verbatim%} Inside verbatim 2 {% set new_val=13%}\n {% endverbatim %}\n ...'

yash lodha
  • 263
  • 3
  • 8

2 Answers2

2

You can use

import re

template_string = "This is test string\n {% set var=2 %}\n  {% verbatim %}\n Inside verbatim 1 {% set var2=4%}\n {% endverbatim %} {% set value=10%}\n {% verbatim%} Inside verbatim 2 {% set new_val=13%}\n {% endverbatim %}\n    ..."
x = re.sub(r"(?s)((?:{%\s*verbatim\s*%}.*?)?{%\s*endverbatim\s*%})|%}\n", lambda m: (m.group(1) if m.group(1) else "%}"), template_string)
print(x)

See IDEONE demo

The (?s)((?:{%\s*verbatim\s*%}.*?)?{%\s*endverbatim\s*%})|%}\n regex matches:

  • (?s) - enables the DOTALL mode (. matches a newline, too)
  • ((?:{%\s*verbatim\s*%}.*?)?{%\s*endverbatim\s*%}) - Group 1 that matches
    • (?:{%\s*verbatim\s*%}.*?)? - one or zero occurrence (=optionally matches) {% followed with zero or more whitespace, then verbatim, then again zero or more whitespaces, followed with %}, then with zero or more characters but as few as possible up to the
    • {%\s*endverbatim\s*%} - {% endverbatim %} where the number of spaces inside can be any
  • | - or...
  • %}\n - a %}+newline

In the replacement part, a lamda is used to check if Group 1 is initialized (not None) because if it is not, a replacement pattern with \1 will fail. Read Empty string instead of unmatched group error about this issue.

Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

You can use re.sub with a callback:

str = "This is test string\n {% set var=2 %}\n  {% verbatim %}\n Inside verbatim 1 {% set var2=4%}\n {% endverbatim %} {% set value=10%}\n {% verbatim%} Inside verbatim 2 {% set new_val=13%}\n {% endverbatim %}\n    ..."

def replcb(m):
    if m.group(1) == None:
        return "%}"
    else:
        return m.group(1)

print re.sub(r'({%\s*verbatim\s*%}[\s\S]*?{%\s*endverbatim\s*%})+|%}\n', replcb, str)
  • This regex captures text between start and end tags in group #1 otherwise %}\n is matched without capturing group.
  • replcb callback puts original captured string back in output if m.group(1) is a valid capture otherwise \n is replaced by %}.

Output:

This is test string
 {% set var=2 %}  {% verbatim %}
 Inside verbatim 1 {% set var2=4%}
 {% endverbatim %} {% set value=10%} {% verbatim%} Inside verbatim 2 {% set new_val=13%}
 {% endverbatim %}
    ...

Code Demo

anubhava
  • 761,203
  • 64
  • 569
  • 643
  • I want to remove newlines after every "%}" except the content in between {% verbatim%} {% endverbatim %} like: {% verbatim %} {% set var=10%}\n {% endverbatim %}. Newline doesn't get removed if "%}\n" appears in between {% verbatim %} {% endverbatim %} – yash lodha Feb 24 '16 at 10:20