0

I'm trying to replace whole appearance of cyrillic word in text:

# -*- coding: utf-8 -*-
import re
S = u"раз Два трИ".lower()
print re.sub(ur"\bдва\b", u"четыре", S, re.U)

Prints "раз два три" while "раз четыре три" is expected.

At the same time search() and findall() works well:

print re.search(ur"\bдва\b", S, re.U).group(0)
print re.findall(ur"\bдва\b", S, re.U)

So the only problem with re.sub()

Latin chars work well:

S = u"one Two threE".lower()
print re.sub(ur"\btwo\b", u"four", S, re.U)

If I try the following way, it swallows spaces (and looks ugly:

print re.sub(u"[^а-яё\d]два[^а-яё\d]", u"четыре", S)

A try to keep spaces doesn't work:

print re.sub(u"(?:[^а-яё\d])(два)(?:[^а-яё\d])", u"четыре", S)

Replace doesn't help too:

S = u"раз Два трИ".lower()
print S
S.replace(u"два", u"четыре")
print S

Prints "раз два три" two times.

Bach
  • 6,145
  • 7
  • 36
  • 61

1 Answers1

1

You should pass flags with keyword argument flags:

In [3]: S = u"раз Два трИ".lower()
In [5]: print re.sub(ur"\bдва\b", u"четыре", S, flags=re.U)
раз четыре три
Umair Khan
  • 189
  • 4