Questions tagged [python-unicode]

Python distinguishes between byte strings and unicode strings. *Decoding* transforms bytestrings to unicode; *encoding* transform unicode strings to bytes.

Python distinguishes between byte strings and unicode strings. Decoding transforms bytestrings to unicode; encoding transform unicode strings to bytes.

Remember: you decode your input to unicode, work with unicode, then encode unicode objects for output as bytes.

See the

1053 questions
0
votes
0 answers

UnicodeDecodeError in smart_select in JSON QuerySet

I'm using Smart Select to join two models using a another one. But when smart_select builds the filter, I get an error 500. When put the server in Debug mode I can see the Exception Type Traceback: File…
toledano
  • 289
  • 11
  • 20
0
votes
1 answer

Why is a UnicodeDecodeError raised when running Python script in console, but not in Eclipse/PyDev?

My script raises a UnicodeDecodeError when run in the Windows 8 console, but not when run in Eclipse/PyDev as a launch configuration. Where is the difference between the PyDev environment and running python.exe from the console in regards to…
Henrik Heimbuerger
  • 9,924
  • 6
  • 56
  • 69
0
votes
1 answer

Python encoding error(it works at one point and then it doesn't...)

So, I'm using Python (with PyQt) and I have this strange problem. In this: self.listwithnames = ["Α.Μ.","Μονομελές-Τριμελές", "Ονοματεπώνυμο","Όνομα Πατρός","Όνομα Μητρός","Ημερομηνία Γέννησης", "Τόπος…
Antoni4040
  • 2,297
  • 11
  • 44
  • 56
0
votes
1 answer

regex - specifying high Unicode codepoints in Python

In Python 3.3 I have no trouble using ranges of Unicode codepoints within regular expressions: >>> import re >>> to_delete = '[\u0020-\u0090\ufb00-\uffff]' >>> s = 'abcdABCD¯˘¸ðﺉ﹅ffl你我他' >>> print(s) abcdABCD¯˘¸ðﺉ﹅ffl你我他 >>> print(re.sub(to_delete, '',…
brannerchinese
  • 1,909
  • 5
  • 24
  • 40
0
votes
1 answer

Converting string unicode to latin

I have the following result from some http requests: Tratamento\ da\ rejei\u00E7\u00E3o\ no\ cancelamento\ da\ desagrega\u00E7\u00E3o i did some research and i was able to find this line of code, wich can convert utf-16 with the following line of…
thclpr
  • 5,778
  • 10
  • 54
  • 87
0
votes
1 answer

Scraperwiki character encoding anomaly

Here is a ScraperWiki scraper written in Python: import lxml.html import scraperwiki from unidecode import unidecode html =…
user82216
0
votes
1 answer

Unicode Disappearing in html.parser

I am extracting HTML from some webpage with Unicode characters as follows: def extract(url): """ Adapted from Python3_Google_Search.py """ user_agent = ("Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) " …
darksky
  • 20,411
  • 61
  • 165
  • 254
0
votes
1 answer

Parsing log file using python and storing its valid value in database using sqlite

Hi i am a newbie to python.i am creating a small program which parses the load log file of a particular website and stores the valid data in particular feild of a database.but some of the feild has weired character like '推广频道'.(non-ascii character '…
Binit Singh
  • 973
  • 4
  • 14
  • 35
0
votes
1 answer

UnicodeDecodeError in SQLite

I'm trying to take a list of 2-element tuples and add them to a SQLite table. The first element of the tuple is a string (encoded in unicode utf-8) and the second element is a murmurhash3 hash of that utf-8 string. This is the violating line: for…
0
votes
1 answer

Django : Can't figure out encoding in Django

I have this application working fine in python 2.7 Totally! It takes "من" for example and changes it to "mn". # -*- coding: utf-8 -*- from __future__ import print_function from __future__ import unicode_literals """Kurdish Alphabet to Kurdish…
0bserver07
  • 3,390
  • 1
  • 28
  • 56
0
votes
2 answers

making a list of traditional Chinese characters from a string

I am currently trying to estimate the number of times each character is used in a large sample of traditional Chinese characters. I am interested in characters not words. The file also includes punctuation and western characters. I am reading in an…
0
votes
1 answer

escape character confusion with mysql and python

I want to upload logo image from http://werkzeug.pocoo.org/ After saving it I tried the following. I'm a noob so please help.. the received stream when i upload through a html forms is…
rakesh
  • 975
  • 2
  • 11
  • 15
0
votes
2 answers

How to properly handle non ASCII strings in python

I'm building an application that in the database has data with latin symbols. Users are able to enter this data. What I've been doing so far is encode('latin2') every user input and decode('latin2') at the very end when displaying data in the…
marcin_koss
  • 5,763
  • 10
  • 46
  • 65
0
votes
1 answer

Python 3: dealing with stripping lines in binary mode

with the help of SO members, i was able to reach up to as following, Following is sample code, aim is just to merges text files from give folder and it's sub folder and store output as master.txt. but i am getting traceback occasionally, looks like…
user1582596
  • 503
  • 2
  • 5
  • 16
-1
votes
1 answer

Unicode subscript r inconsistent

I want to use the following unicode characters in python to display the unit W_rms. However it seems the subscript "r" is different to the others: I used the following codes: >>> print('w\u1d63\u2098\u209b') wᵣₘₛ Any help is appreciated. I tried…