In python, why does calling a string, "X", display it in ASCII, but calling "print X" display it in unicode?

Question

I've got a list of strings, along the lines of list=[a,b,c,d,e].

When I call list[2], the string c is displayed as ASCII; when I call print list[2], however, it's displayed as unicode. Why does this discrepancy exist?

For similar reasons to why `"123"` displays differently than `print "123"`. — Scott Hunter, Feb 09 '16 at 17:47
Could you show an *unedited* transcript of the phenomenon, please? We don't know what you mean by "calling" - neither strings nor `print` statements are "callable" in Python jargon - and we also don't know what you mean by "ascii" and "unicode". — zwol, Feb 09 '16 at 17:47

score 3 · Accepted Answer · answered Feb 09 '16 at 18:10

This is mainly because strings in Python 2 are not text strings but byte strings.

I suppose you are in a REPL environment (a Python console). When you evaluate something in the console, you get its printed representation which is the same as calling print repr() on the expression:

l = ['ñ']
l[0] # should output '\xc3\xb1'
print repr(l[0]) # should output the same

This is because your console is in UTF-8 mode (if you get a different representation for ñ it is because your console uses some other text representation) so when you press ñ you are actually entering two bytes 0xc3 and 0xb1.

repr() is a Python method that always returns a string. For primitive types, this string is a valid source to rebuild the value passed as parameter. This case it returns a string with a sequence of bytes that recreates another string with the ñ encoded as UTF-8. To see this:

repr(l[0]) # should print a string within a string: "'\\xc3\\xb1'"

So when you print it (which is the same as just evaluating in the console), you get the same string without the outer quotes and the escaped characters properly replaced. I.e:

print repr(l[0]) # should output '\xc3\xb1'

But, when you print the value, i.e: print l[0], then you send those two bytes to the console. As the console is in UTF-8 mode, it decodes the sequence and translate it to only one character: ñ. So:

print l[0] # should output ñ

If you want to store text strings, you must use the modifier u before the string. This way:

text = u'ñ'

Now, when evaluating text you will see its Unicode codepoint:

text # should output u'\xf1'

And printing it should recreate the ñ glyph:

print text # should output `ñ`

If you want to convert text into a byte string representation, you need an encoding scheme (such as UTF-8):

text.encode('utf-8') == l[0] # should output True

Similarly, it you want the Unicode representation for l[0], you'll need to decode those bytes:

l[0].decode('utf-8') == text # should output True

All this said, notice in Python 3, default strings are indeed Unicode Strings and you need to prefix the literal notation with b to produce byte strings.

score 2 · Answer 2 · answered Feb 09 '16 at 17:52

It's because those two ways of displaying a string use different routes to get to the final result. x by itself in the REPL will invoke repr(x) and display that, but print(x) will invoke str(x) and display that instead. Classes are allowed to define __repr__ and __str__ separately, so they don't always return the same value.

>>> x = u"a"
>>> x
u'a'
>>> print x
a
>>> repr(x)
"u'a'"
>>> str(x)
'a'
>>>

In python, why does calling a string, "X", display it in ASCII, but calling "print X" display it in unicode?

2 Answers2