Steps to reproduce:
- Create a file
test.txt
with contentThis is 中文
(i.e., UTF-8 encoded non-ASCII text). - Custom-compile python 3.5.2 on the Intel Edison.
Launch the custom-compiled python3 interpreter and issue the following piece of code:
with open('test.txt', 'r') as fh: fh.readlines()
Actual behavior:
A UnicodeDecodeError
exception is thrown. The file is opened as 'ASCII' instead of 'UTF-8' by default:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 8: ordinal not in range(128)
On a "regular" Linux system this problem is easily solved by setting a proper locale, see e.g. this post or that post. On the Intel Edison, however, I cannot set the LC_CTYPE
since the default Yocto Linux distribution is missing locales (see e.g. this page).
I also tried to use a couple of other hacks like
import sys; sys.getfilesystemencoding = lambda: 'UTF-8'
import locale; locale.getpreferredencoding = lambda: 'utf-8'
And I tried setting the PYTHONIOENCODING=utf8
environment variable before starting the python interpreter.
However, none of this works. The only workaround is to specify the encoding explicitly as a command line parameter to the open
command. This works for the above snippet, but it won't set the system-wide default for all packages I am using (that will implicitly open files as ASCII and may or may not offer me a way to override that default behavior).
What is the proper way to set the python interpreter default filesystem encoding? (Of course without installing unneeded system-wide locales.)