I am given a string of Hebrew characters (and some other Arabic ones. I know neither of them) in a file
צוֹר
When I load this string from file in Python3
fin = open("filename")
x = next(fin).strip()
The length of x
appears to be 5
>>> len(x)
5
Its unicode utf-8 encoding is
>>> x.encode("utf-8")
b'\xd7\xa6\xd7\x95\xd6\xb9\xd7\xa8\xe2\x80\x8e'
However, in browsers, it is clear that the length of these Hebrew characters is 3.
How to get the length properly? And why does this happen?
I am aware that Python 3 is by default unicode so I did not expect there is such an issue.