Not A Duplicate
This is not a duplicate of this question I think. There the answer says how to fix the problem in python 2 and that it should not occur in python 3. Also, the answer provided does not not work for me:
>>"ć́".decode()
AttributeError: 'str' object has no attribute 'decode'
>>len(u"ć́")
2
Original Question:
I am importing book data from a website, and process it then. One of the first steps is to so some stuff with the length of a certain string. Unfortunately the len() function sometimes returns a false value, when abnormal" characters are included:
>>len("Krste Asanović́ ... [et al.].")
29
>>ord("ć́")
TypeError: ord() expected a character, but string of length 2 found
Here the "ć́" is not a standard character, if I replace it with a normal "c" I get a different result.
>>len("Krste Asanovic ... [et al.].")
28
I can, of course, solve the problem using replace():
>>"Krste Asanović́ ... [et al.].".replace("ć́","c")
'Krste Asanovic ... [et al.].'
But is there a way to "forbid" weird letters in the first place?
EDIT
>>list("ć́")
['ć', '́']
I'm using python3.6
EDIT 2
this...
>>"ć́".replace("´","")
"ć́"
does nothing.