1

I have a quite odd problem with PyCharm and a Python app that I am working on.

  • Pycharm is PyCharm Community Edition 2016.3.2
  • The project interpreter is: 3.6.0
  • OS is MacOS Sierra

As I am have been googling for a solution for some time and no proposed idea helps I want to ask here.

I want to open an UTF-8 encoded file using the following code:

#!/usr/bin/env python3    

import os, platform

def read(file):
    f = open(file, "r")
    content = f.read()
    f.close()
    return content

print(platform.python_version())
print(os.environ["PYTHONIOENCODING"])

content = read("testfile")
print(content)

The code crashes when run in PyCharm. The output is

3.6.0
UTF-8
Traceback (most recent call last):
  File "/Users/xxx/Documents/Scripts/pycharmutf8/file.py", line 14, in <module>
    content = read("testfile")
  File "/Users/xxx/Documents/Scripts/pycharmutf8/file.py", line 7, in read
    content = f.read()
  File "/usr/local/Cellar/python3/3.6.0_1/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)

When I run the identical code from command line, it works just fine:

./file.py 
3.6.0
utf-8:surrogateescape
I am a file with evil unicode characters: äöü

I have found out that in comparable situations people are advised to set the environment variable PYTHONIOENCODING to utf-8:surrogateescape that I did (as you can see in above output) system-wide

export PYTHONIOENCODING=utf-8:surrogateescape

but also in PyCharm itself (Settings -> Build -> Console -> Python Console -> Environment variables).

This does not have any effect. Do you have further suggestions?

drscheme
  • 19
  • 1
  • 7
  • 1
    `open` uses `locale.getpreferredencoding(False)` to guess at the encoding, so my guess is that the locale set by PyCharm is different from your terminal. – MatsLindh Mar 15 '17 at 14:20

2 Answers2

3

If it's harder to change the encoding for the open call i.e. it's happening in a library you can change this environment variable in the run configurations: LC_CTYPE=en_US.UTF-8

Source: PyCharm is changing the default encoding in my Django app

Rajas Agashe
  • 371
  • 2
  • 5
1

If you want to read a UTF8 file, specify the encoding:

def read(file):
    with open(file, encoding='utf8') as f:
        content = f.read()
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
  • 1
    Thanks. That solved my issue. However, I wonder why the same code behaved differently in the Terminal (worked) and in PyCharm (did not work). – drscheme Mar 15 '17 at 15:36
  • As @Mats comment says, Pycharm and the terminal must be set to different locales. The error indicates ASCII was the default, which Python chooses when it can't figure the locale out. – Mark Tolonen Mar 15 '17 at 18:37