Questions tagged [python-unicode]

Python distinguishes between byte strings and unicode strings. *Decoding* transforms bytestrings to unicode; *encoding* transform unicode strings to bytes.

Python distinguishes between byte strings and unicode strings. Decoding transforms bytestrings to unicode; encoding transform unicode strings to bytes.

Remember: you decode your input to unicode, work with unicode, then encode unicode objects for output as bytes.

See the

1053 questions
33
votes
2 answers

Unicode Encode Error when writing pandas df to csv

I cleaned 400 excel files and read them into python using pandas and appended all the raw data into one big df. Then when I try to export it to a csv: df.to_csv("path",header=True,index=False) I get this error: UnicodeEncodeError: 'ascii' codec…
collarblind
  • 4,549
  • 13
  • 31
  • 49
29
votes
2 answers

Google App Engine: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 48: ordinal not in range(128)

I'm working on a small application using Google App Engine which makes use of the Quora RSS feed. There is a form, and based on the input entered by the user, it will output a list of links related to the input. Now, the applications works fine for…
Manas Chaturvedi
  • 5,210
  • 18
  • 52
  • 104
29
votes
8 answers

String.maketrans for English and Persian numbers

I have a function like this: persian_numbers = '۱۲۳۴۵۶۷۸۹۰' english_numbers = '1234567890' arabic_numbers = '١٢٣٤٥٦٧٨٩٠' english_trans = string.maketrans(english_numbers, persian_numbers) arabic_trans = string.maketrans(arabic_numbers,…
Shahin
  • 1,415
  • 4
  • 22
  • 33
28
votes
5 answers

how to convert Python 2 unicode() function into correct Python 3.x syntax

I enabled the compatibility check in my Python IDE and now I realize that the inherited Python 2.7 code has a lot of calls to unicode() which are not allowed in Python 3.x. I looked at the docs of Python2 and found no hint how to upgrade: I don't…
guettli
  • 25,042
  • 81
  • 346
  • 663
25
votes
2 answers

What's "ANSI_X3.4-1968" encoding?

See following output on my system: [STEP 101] # python3 -c 'import sys; print(sys.stdout.encoding)' ANSI_X3.4-1968 [STEP 102] # [STEP 103] #…
pynexj
  • 19,215
  • 5
  • 38
  • 56
25
votes
1 answer

How to decode and encode Hebrew strings?

I am trying to encode and decode the Hebrew string "שלום". However, after encoding, I get gibberish: >>> word = "שלום" >>> word = word.decode('UTF-8') >>> word u'\u05e9\u05dc\u05d5\u05dd' >>> print word שלום >>> word = word.encode('UTF-8') >>>…
user1767774
  • 1,775
  • 3
  • 24
  • 32
23
votes
2 answers

python 2.7 lowercase

When I use .lower() in Python 2.7, string is not converted to lowercase for letters ŠČŽ. I read data from dictionary. I tried using str(tt["code"]).lower(), tt["code"].lower(). Any suggestions ?
Yebach
  • 1,661
  • 8
  • 31
  • 58
20
votes
1 answer

Python-3 and \x Vs \u Vs \U in string encoding and why

Why do we have different byte oriented string representations in Python 3? Won't it be enough to have single representation instead of multiple? For ASCII range number printing a string shows a sequence starting with \x: In [56]: chr(128) Out[56]:…
MaNKuR
  • 2,578
  • 1
  • 19
  • 31
17
votes
1 answer

Will a UNICODE string just containing ASCII characters always be equal to the ASCII string?

I noticed the following holds: >>> u'abc' == 'abc' True >>> 'abc' == u'abc' True Will this always be true or could it possibly depend on the system locale? (It seems strings are unicode in python 3: e.g. this question, but bytes in 2.x)
doctorlove
  • 18,872
  • 2
  • 46
  • 62
16
votes
1 answer

Why does ElementTree reject UTF-16 XML declarations with "encoding incorrect"?

In Python 2.7, when passing a unicode string to ElementTree's fromstring() method that has encoding="UTF-16" in the XML declaration, I'm getting a ParseError saying that the encoding specified is incorrect: >>> from xml.etree import ElementTree >>>…
Henrik Heimbuerger
  • 9,924
  • 6
  • 56
  • 69
14
votes
3 answers

Load Python 2 .npy file in Python 3

I'm trying to load /usr/share/matplotlib/sample_data/goog.npy: datafile = matplotlib.cbook.get_sample_data('goog.npy', asfileobj=False) np.load(datafile) It's fine in Python 2.7, but raises an exception in Python 3.4: UnicodeDecodeError: 'ascii'…
Frozen Flame
  • 3,135
  • 2
  • 23
  • 35
13
votes
3 answers

Regex to Match Horizontal White Spaces

I need a regex in Python2 to match only horizontal white spaces not newlines. \s matches all whitespaces including newlines. >>> re.sub(r"\s", "", "line 1.\nline 2\n") 'line1.line2' \h does not work at all. >>> re.sub(r"\h", "", "line 1.\nline…
Memduh
  • 836
  • 8
  • 18
13
votes
3 answers

Python3: UnicodeEncodeError: 'ascii' codec can't encode character '\xfc'

I'am trying to get running a very simple example on OSX with python 3.5.1 but I'm really stucked. Have read so many articles that deal with similar problems but I can not fix this by myself. Do you have any hints how to resolve this issue? I would…
Hans Bondoka
  • 437
  • 1
  • 4
  • 14
13
votes
1 answer

Display width of unicode strings in Python

How can I determine the display width of a Unicode string in Python 3.x, and is there a way to use that information to align those strings with str.format()? Motivating example: Printing a table of strings to the console. Some of the strings contain…
Christian Aichinger
  • 6,989
  • 4
  • 40
  • 60
12
votes
6 answers

python url unquote followed by unicode decode

I have a unicode string like '%C3%A7%C3%B6asd+fjkls%25asd' and I want to decode this string. I used urllib.unquote_plus(str) but it works wrong. expected : çöasd+fjkls%asd result : çöasd fjkls%asd double coded utf-8 characters(%C3%A7 and %C3%B6)…
user637287
  • 123
  • 1
  • 1
  • 4
1
2
3
70 71