I'm trying to replace whole appearance of cyrillic word in text:
# -*- coding: utf-8 -*-
import re
S = u"раз Два трИ".lower()
print re.sub(ur"\bдва\b", u"четыре", S, re.U)
Prints "раз два три" while "раз четыре три" is expected.
At the same time search() and findall() works well:
print re.search(ur"\bдва\b", S, re.U).group(0)
print re.findall(ur"\bдва\b", S, re.U)
So the only problem with re.sub()
Latin chars work well:
S = u"one Two threE".lower()
print re.sub(ur"\btwo\b", u"four", S, re.U)
If I try the following way, it swallows spaces (and looks ugly:
print re.sub(u"[^а-яё\d]два[^а-яё\d]", u"четыре", S)
A try to keep spaces doesn't work:
print re.sub(u"(?:[^а-яё\d])(два)(?:[^а-яё\d])", u"четыре", S)
Replace doesn't help too:
S = u"раз Два трИ".lower()
print S
S.replace(u"два", u"четыре")
print S
Prints "раз два три" two times.