I'm writing a small script for auto-correcting interwiki translation links with pywikibot. I look for the existing links and want to rewrite them in a standard format with links to all pages.
The text I'm looking for looks either somewhat like
{{Trad|EN=Under Spring|FR=Sources Interdites|DE=Verbotene Quellen}}
or multiline somewhat like
{{Trad |DE=Urwurzeln |EN=Prime Roots |ES=Raíces Primarias |FR=Primes Racines |RU =Изначальных Корней |H = |palette=primes }}
I manage to find these two instances in a wiki page source via
reg_strg = '{{trad([\w\s\|\=]*)}}'
rex = re.search(reg_strg, text, re.IGNORECASE | re.MULTILINE)
That yields me the heart of the templates like (for the 1st case)
|EN=Under Spring|FR=Sources Interdites|DE=Verbotene Quellen
and similarily as multiline-string for the 2nd.
However, I now use the same reg_strg in a replace command, it fails to do any replacement, text remains unmodified, new_strg is created from what was read to constitute a replacement string. But the result is independent of whether new_strg is a multi-line string or just a simple "flobberigoo"
text = re.sub(reg_strg, new_strg, text, re.IGNORECASE | re.MULTILINE)
So obviously there is some difference between re.search and re.sub - however I fail to find that in the documentation (even though I am aware of the difference between re.search and re.match, I understood it such, that re.sub should behave like the first).
What do I miss? How can I replace the mentioned regex which I find in the pages with a string?
For completeness sake, this is the complete function including the debug prints:
def replace_translation_template(self, text, translations):
"""
@param text The page text to look through
@param translations dictionary of translations
"""
reg_strg = '{{trad([\w\s\|\=]*)}}'
rex = re.search(reg_strg, text, re.IGNORECASE | re.MULTILINE)
print("Replacing:")
try:
print(rex.group(1))
strgs = rex.group(1).split('|')
print(strgs)
new_strg = ""
for lang,pagename in translations.items():
if pagename is None:
pagename = ""
new_strg += '|' + lang.upper() + '=' + pagename + '\n'
#print("New_strg: ", new_strg)
for lang in translations.keys():
for (n,str) in enumerate(strgs):
if lang.upper() in str:
strgs.pop(n)
for s in strgs:
if len(s) > 2:
new_strg += '|' + s + '\n'
print(new_strg, '\n')
print('\n with \n \n')
text = re.sub(reg_strg, new_strg, text, re.IGNORECASE | re.MULTILINE)
print(text)
except:
print("no text matched:", rex)