0

I was reading the PEP 263 and I got stuck on this paragraph:

In Python 2.1, Unicode literals can only be written using the Latin-1 based encoding "unicode-escape". This makes the programming environment rather unfriendly to Python users who live and work in non-Latin-1 locales such as many of the Asian countries. Programmers can write their 8-bit strings using the favorite encoding, but are bound to the "unicode-escape" encoding for Unicode literals.

What does "unicode-escape" mean? How did people from asia wrote python files? Can someone show me how they wrote python source code before python 2.3? I just can't get why PEP 263 was introduced: i installed on my machine python 2.1.3 and I launched python cod.py where cod.py is a file encoded in utf-8 and all just worked fine.

zer0uno
  • 7,521
  • 13
  • 57
  • 86

1 Answers1

1

A Unicode-escape is a character in the form '\xab', the \x means to take the next two characters and interpret them as a hex code to produce a single character.

Characters in Asian languages often take more than a single byte, so for example the character might be '\xe8\x8d\x89'.

You could not use these characters in source code, except as part of a string literal (or perhaps a comment).

Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
  • I don't understand, in python 2.1 couldn't I write `x = u'草'` ? Did I have to write `x = u'\xe8\x8d\x89'`? – zer0uno Jan 29 '15 at 00:11
  • @antox that's correct, because `草` doesn't exist in the Latin-1 character set. At least that's what I gather from the piece you quoted, I don't have any direct experience. – Mark Ransom Jan 29 '15 at 03:10