2

I am using Django 1.3. Would you be so kind and answer me one question. I am reading data from my database, where encoding is set to untf8-unicode

settings.py
DEFAULT_CHARSET = 'utf-8'

file.py
# -*- coding: utf-8 -*-
def get_gift(gift_id):
    gift = Gift.objects.get(id__exact = gift_id, is_visible = True)
    return gift

def output():
    gift = get_gift(gift_id)
    title = gift.name.encode('utf-8')
    return HttpResponse(title)

In response I am getting \u0411\u0435\u0441\u0435\u0434\u043a\u0430, but it should be in Russian (Cyrillic)

Cœur
  • 37,241
  • 25
  • 195
  • 267
Roman
  • 21
  • 1
  • How about you remove this extra `gift.name.encode('utf-8')`? – Torsten Engelbrecht May 17 '11 at 09:08
  • I have removed encode('utf-8') and have a success if I have a code - return HttpResponse(name). So, it means that json.dumps(name) converts it to that horror... – Roman May 17 '11 at 09:15
  • 1
    Thanks to everyone. I have found a solution - return HttpResponse(json.dumps(info).decode('raw-unicode-escape').encode('utf-8')) – Roman May 17 '11 at 09:18
  • Where are you seeing those characters? Have you tried parsing that JSON? – Dominic Rodger May 17 '11 at 09:19
  • I didn't see you used json.dumps somewhere..... anyway, if you output something as JSON it will decode everything, yes. That's the right behavior. If you load this JSON via AJAX though and parse it you will get the right output (cyrilic in your case). – Torsten Engelbrecht May 17 '11 at 09:34
  • please submit an answer and mark it as accepted to keep stack clean – ashwoods Jun 07 '11 at 10:35

1 Answers1

0

It often happens that you have non-Roman text data in Unicode, but you can't display it -- usually because you're trying to show it to a user via an application that doesn't support Unicode, or because the fonts you need aren't accessible. You could represent the Unicode characters as "???????" or "\15BA\15A0\1610...", but that's nearly useless to the user who actually wants to read what the text says.

What Unidecode provides is a function, 'unidecode(...)' that takes Unicode data and tries to represent it in ASCII (i.e., the universally displayable characters between 0x00 and 0x7F).

The representation is almost always an attempt at transliteration -- i.e., conveying, in Roman letters, the pronunciation expressed by the text in some other writing system. (See the example above)

More information here

try pip install Unidecode

ApPeL
  • 4,801
  • 9
  • 47
  • 84