0

my question sounds stupid but is there something that doesn't work with python 3.x and re.sub? Because this simple code seems to be wrong, it doesn't print '(026) 660 68 21' on my console. Can someone help me?

import re    
word = "Tél : (026) 660 68 21"
w = re.sub("Tél : ", "", word, count=1)
print(w)
wim
  • 338,267
  • 99
  • 616
  • 750
jjyoh
  • 428
  • 2
  • 6
  • 22

1 Answers1

2

I can not reproduce the issue with the code from your question.

But what you are seeing might happen, for example, if the e aigu character is in a different form:

>>> word = "Tél : (026) 660 68 21"
>>> substring = "Tél : "
>>> re.sub(substring, "", word, count=1)
'Tél : (026) 660 68 21'

The fix:

>>> from unicodedata import normalize
>>> def n(str_, form='NFC'):
        return normalize(form, str_)
... 
>>> re.sub(n(substring), "", n(word), count=1)
'(026) 660 68 21'
wim
  • 338,267
  • 99
  • 616
  • 750
  • Also possible, one of the whitespaces might not be U+0020 but any of [these](https://codepoints.net/search?gc=Zs) (classically U+00A0). And if someone pulled a prank, the colon is [this](https://codepoints.net/U+2236). – Boldewyn Apr 08 '16 at 22:36
  • I was thinking about the e aigu too because the string is generate by data scrapping, so probably the problem. But your code doesn't work for me.. but thank's for the intuition, i'm going to check on this side – jjyoh Apr 08 '16 at 22:58