I'm not familiar with Arabic text, but I do know that Arabic letters work differently than letters in Latin/English. Nearby letters somehow affect each other, which might be the source of confusion here.
Here's what happens when you carry out the replacement (in Python 3):
text = "صَوتُ صَفيرِ البُلْبُلِ"
text2 = text.replace("ص", "")
print(text) # صَوتُ صَفيرِ البُلْبُلِ
print(text2) # وتُ َفيرِ البُلْبُلِ
The comments above are copied from the printed output. However, if you copy them back in, the copied output of text2
is in fact not identical to text2
. Something is missing from the printout (this is not the case for the original text
). I imagine that the resulting text2
in fact is not possible to print out correctly (i.e. in Arabic, some combinations of characters/symbols does not result in meaningful text).
Let's not rely on direct printout then, but instead consider each character at a time:
import itertools, unicodedata
text = "صَوتُ صَفيرِ البُلْبُلِ"
text2 = text.replace("ص", "")
def compare_texts(text1, text2):
for c1, c2 in itertools.zip_longest(text1, text2, fillvalue=""):
name1 = unicodedata.name(c1) if c1 else ''
name2 = unicodedata.name(c2) if c2 else ''
print(f"{name1:<20} : {name2:<20}")
compare_texts(text, text2)
From the output of the above, we see that text2
indeed is just text
with ARABIC LETTER SAD
('ص'
) missing in two places.
In conclusion: str.replace()
does what you want (or at least what you tell it to do), it just might not look like it in the (naïvely) printed output.
Bonus
Here's a short video describing how/why Arabic (and other non-Latin writing systems) are more complicated than the one.