You need to open the file while specifying the correct encoding. In Python 3, that's done using
with open("myfile.txt", "r", encoding="utf-8-sig") as myfile:
contents = myfile.read()
for char in contents:
# do something with character
In Python 2, you can use the codecs
module:
import codecs
with codecs.open("myfile.txt", "r", encoding="utf-8-sig") as myfile:
contents = myfile.read()
for char in contents:
# do something with character
Note that in this case, Python 2 will not do automatic newline conversion, so you need to handle \r\n
line endings explicitly.
As an alternative (Python 2), you can open the file normally and decode it afterwards; that will normalize line endings to \n
:
with open("myfile.txt", "r") as myfile:
contents = myfile.read().decode("utf-8-sig")
for char in contents:
# do something with character
Note that in both cases, you will end up with Unicode objects in Python 2, not strings (in Python 3, all strings are Unicode objects).