0

I have a very long text with parts enclosed in +++ which I would like to enclose in square brackets

se1 = "+++TEXT:+++ Moshe Morgenstern is on his way to the main synagogue in the center of Bnei Brak, home to a largely ultra-orthodox - or haredi - community. +++ : Bnei Brak, Tel Aviv + Jerusalem ))+++"

I would like to convert text enclosed in +++ to [[]] so,

+++TEXT+++ should become [[TEXT]]

My code:

import re


se1 = "+++TEXT:+++ Moshe Morgenstern is on his way to the main synagogue in the center of Bnei Brak, home to a largely ultra-orthodox - or haredi - community. +++ Karte Israel mit: Bnei Brak, Tel Aviv + Jerusalem ))+++"

comments = re.sub(r"\+\+\+.*?\+\+\+", r"[[.*?]]", se1)
print(comments)

but it gives the wrong output

[[.*?]] Moshe Morgenstern is on his way to the main synagogue in the center of Bnei Brak, home to a largely ultra-orthodox - or haredi - community. [[.*?]]
user343
  • 21
  • 8

2 Answers2

1

You need to capture the group with () and then reference that matching group with \1

This should work fine:

>>> comments = re.sub(r"\+\+\+(.*?)\+\+\+", r"[[\1]]", se1)
>>> comments
'[[TEXT:]] Moshe Morgenstern is on his way to the main synagogue in the center of Bnei Brak, home to a largely ultra-orthodox - or haredi - community. [[ Karte Israel mit: Bnei Brak, Tel Aviv + Jerusalem ))]]'

Take into account that \+\+\+ can be simplified to \+{3} as well.

Damiox
  • 622
  • 4
  • 12
1

You can use this:

re.sub(r'\+\+\+(.*?)\+\+\+',r'[[\1]]',se1)

As the .*? in the second string is seen as pure string instead of the replacement for the .*? in the match string, the (.*?) means to save this part to be used in the replacement string, and \1 is the data saved.

David
  • 816
  • 1
  • 6
  • 16