4

I came across obscure problem when raised Python exception got printed to win console. When exception message contains any unicode literal it is not printed at all or is printed improperly. Console encoding is cp866

When python default encoding is ascii.

raise LookupError(u"symbol: \u0411")

Gets printed as:

LookupError


When I set default encoding to utf-8 I get

LookupError: symbol: ╨С


When I do

print u"symbol: \u0411"

In both cases I get:

symbol: Б

Why there is that difference in the behaviour? What should I do to do things right?

hippietrail
  • 15,848
  • 18
  • 99
  • 158
Unicorn
  • 1,397
  • 1
  • 15
  • 24
  • 1
    apparently py3 has better support for this kinda stuff ... I do localizations and I struggle with this stuff all the time... – Joran Beasley Sep 18 '12 at 21:14
  • "When I set default encoding to utf-8 I get..." - are you speaking of PYHTONIOENCODING environment variable or setting encoding in the header of your module? – Boris Burkov Sep 18 '12 at 21:17
  • Bob, I use import sys, reload(sys), sys.setdefaultencoding("utf-8") – Unicorn Sep 18 '12 at 21:20

1 Answers1

1

When the exception is going to be printed and Unicode message is given, Python tries to encode it using the encoding returned by sys.getdefaultencoding(). If it fails, the encoding error is supressed and you get the weird output.

In the print situation, the Unicode string is encoded using the sys.stdout.encoding. Yeah, it would probably be better if the excepthook used sys.stderr.encoding rather than sys.getdefaultencoding().

Note that the following works.

raise LookupError(u"symbol: \u0411".encode(your_encoding))

You can also change the default encoding in your sitecustomize or usercustomize by calling sys.setdefaultencoding(your_encoding). Your system should be configured so the default encoding is equal to sys.stderr.encoding (and to the encoding of the other standard streams).

Also, this problem exists no more in Python 3.

user87690
  • 687
  • 3
  • 25