0

I'm writing a small script for auto-correcting interwiki translation links with pywikibot. I look for the existing links and want to rewrite them in a standard format with links to all pages.

The text I'm looking for looks either somewhat like

{{Trad|EN=Under Spring|FR=Sources Interdites|DE=Verbotene Quellen}}

or multiline somewhat like

{{Trad
|DE=Urwurzeln
|EN=Prime Roots
|ES=Raíces Primarias
|FR=Primes Racines
|RU =Изначальных Корней
|H  = 
|palette=primes
}}

I manage to find these two instances in a wiki page source via

reg_strg = '{{trad([\w\s\|\=]*)}}'
rex = re.search(reg_strg, text, re.IGNORECASE | re.MULTILINE)

That yields me the heart of the templates like (for the 1st case)

|EN=Under Spring|FR=Sources Interdites|DE=Verbotene Quellen

and similarily as multiline-string for the 2nd.

However, I now use the same reg_strg in a replace command, it fails to do any replacement, text remains unmodified, new_strg is created from what was read to constitute a replacement string. But the result is independent of whether new_strg is a multi-line string or just a simple "flobberigoo"

text = re.sub(reg_strg, new_strg, text, re.IGNORECASE | re.MULTILINE)

So obviously there is some difference between re.search and re.sub - however I fail to find that in the documentation (even though I am aware of the difference between re.search and re.match, I understood it such, that re.sub should behave like the first).

What do I miss? How can I replace the mentioned regex which I find in the pages with a string?

For completeness sake, this is the complete function including the debug prints:

def replace_translation_template(self, text, translations):
    """
    @param text The page text to look through
    @param translations dictionary of translations
    """
    reg_strg = '{{trad([\w\s\|\=]*)}}'
    rex = re.search(reg_strg, text, re.IGNORECASE | re.MULTILINE)

    print("Replacing:")
    try:
        print(rex.group(1))
        strgs = rex.group(1).split('|')
        print(strgs)
        new_strg = ""
        for lang,pagename in translations.items():
            if pagename is None:
                pagename = ""
            new_strg += '|' + lang.upper() + '=' + pagename + '\n'

        #print("New_strg: ", new_strg)

        for lang in translations.keys():
            for (n,str) in enumerate(strgs):
                if lang.upper() in str:
                    strgs.pop(n)

        for s in strgs:
            if len(s) > 2:
                new_strg += '|' + s + '\n'
        print(new_strg, '\n')
        print('\n with \n \n')
        text = re.sub(reg_strg, new_strg, text, re.IGNORECASE | re.MULTILINE)
        print(text)

    except:
        print("no text matched:", rex)
planetmaker
  • 5,884
  • 3
  • 28
  • 37
  • `text = re.sub(reg_strg, new_strg, text, flags=re.IGNORECASE | re.MULTILINE)` – Wiktor Stribiżew Jun 12 '20 at 20:01
  • Sorry, did not see that the question is already closed. It is not my intention to improve you. Your code is well done. However, if you rearrange your code, you can write it more compactly and reduce one iteration, I think. I would send you my modified version if I could. – Detlef Jun 12 '20 at 22:00

0 Answers0