4

In python (2.7.1):

>>> x = u'$€%'
>>> x.find('%')
2
>>> len(x)
3

Whereas in ipython:

>>> x = u'$€%'
>>> x.find('%')
4
>>> len(x)
5

What's going on here?


edit: including the additional info requested from the comments below

ipython

>>> import sys, locale
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding(locale.getdefaultlocale()[1])
>>> sys.getdefaultencoding()
'UTF8'
>>> x = u'$€%'
>>> x
u'$\xe2\x82\xac%'
>>> print x
$â¬%
>>> len(x)
5

python

>>> import sys, locale
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding(locale.getdefaultlocale()[1])
>>> sys.getdefaultencoding()
'UTF8'
>>> x = u'$€%'
>>> x
u'$\u20ac%'
>>> print x
$€%
>>> len(x)
3
wim
  • 338,267
  • 99
  • 616
  • 750
  • What happens if you type `x ` on the command line in CPython? What character encodings are your shells using? – Tim Pietzcker Sep 29 '11 at 06:53
  • `print x` gives `$€%`, whereas just `x` gives `u'$\u20ac%'` – wim Sep 29 '11 at 06:57
  • sys.stdin.encoding is 'UTF-8' – wim Sep 29 '11 at 08:23
  • 1
    This is certainly a bug in the ipython shell. It shouldn't affect your running programs though. Broken Unicode terminals are quite common in general (just look at the Windows console...) so relying on being able to type and print Unicode to console, in any language, is generally pretty iffy. It will work fine inside scripts themselves and for other methods of IO. – bobince Sep 29 '11 at 19:49

2 Answers2

5

@nye17 It's officially not a good idea to ever call setdefaultencoding() (it is removed from sys after first use for a reason). One common culprit is gtk, which causes all kinds of problems, so if IPython has imported gtk, sys.getdefaultencoding() will return utf8. IPython does not set the default encoding itself.

@wim can I ask what version of IPython you are using? Part of the major overhaul in 0.11 was fixing many unicode bugs, but more do crop up (mostly on Windows, now).

I ran your test case in IPython 0.11, and the behavior of IPython and Python do appear to be the same, so I think this bug is fixed.

Relevant values:

  • sys.stdin.encoding = utf8
  • sys.getdefaultencoding() = ascii
  • platforms tested: Ubuntu 10.04+Python2.6.5, OSX 10.7+Python2.7.1

As for an explanation, essentially IPython didn't recognize that input could be unicode. In IPython 0.10, the multibyte utf8 input is not being respected, so each byte = 1 character, which you can see with:

In [1]: x = '$€%'

In [2]: x
Out[2]: '$\xe2\x82\xac%'

In [3]: y = u'$€%'

In [4]: y
Out[4]: u'$\xe2\x82\xac%'# wrong!

Whereas, what should happen, and what does happen in 0.11, is that y == x.decode(sys.stdin.encoding), not repr(y) == 'u'+repr(x).

minrk
  • 37,545
  • 9
  • 92
  • 87
  • 1
    you're correct, it was a ipython bug.. i built 0.12.dev and it's right now. oh, the new inline qtconsole is very cool too!! – wim Sep 30 '11 at 09:24
1

if you do

import sys
sys.getdefaultencoding()

I think you will get different results in python an ipython, possible one ascii, and the other one being utf-8, so it should only be a matter of which default encoding each one is choosing.

The other test you can do is to type the following to enfore it as your default locale,

import sys, locale
reload(sys)
sys.setdefaultencoding(locale.getdefaultlocale()[1])
sys.getdefaultencoding()

then try the test of x in your question.

nye17
  • 12,857
  • 11
  • 58
  • 68
  • Both ipython and python give me 'ascii'. – wim Sep 29 '11 at 07:06
  • 1
    @wim that is disturbing. I just tested in my ipython and python, the `len` and `print` give the same output as in your questin, but my defaultencodings are different, one `ascii`, the one `utf-8`. – nye17 Sep 29 '11 at 07:10
  • @wim can you try the test I edited in the answer? basically to enforce the same encoding in both places to see whether strings are behaving in the same way. – nye17 Sep 29 '11 at 07:25
  • In the Notebook, the `sys` module doesn't have the `setdefaultencoding()` method, nor is the builtin `reload()` method available. so i have no idea how to get unicode working in the IPy Notebook – taylor May 26 '13 at 17:32