I'm trying to log the contents of a file, but I get some funny behavior from the logging module (and not only that one).
Here is the file contents:
"Testing …"
Testing å¨'æøöä
"Testing å¨'æøöä"
And here is how I open and log it:
with codecs.open(f, "r", encoding="utf-8") as myfile:
script = myfile.read()
log.debug("Script type: {}".format(type(script)))
print(script)
log.debug("{}".format(script.encode("utf8")))
The line where I log the type of the object shows up as follows in my logs:
Script type: <type 'unicode'>
Then the print ...
line prints the contents correctly to console, but, the logging module throws an exception:
Traceback (most recent call last):
File "/usr/lib/python2.7/logging/__init__.py", line 882, in emit
stream.write(fs % msg.encode("UTF-8"))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 63: ordinal not in range(128)
When I remove the .encode("utf8")
bit from that last line, I get the expected exception:
'ascii' codec can't encode character u'\u2026' in position 9: ordinal not in range(128)
This is just to demonstrate the problem. It's not only the logging module. Rest of my code also throws similar exceptions when dealing with this "unicode" string.
What am I doing wrong?