0

I'm trying to solve this issue when importing a CSV file. I try to save a string variable that contains latin-1 characters and when I try to print them, it changes it to an encoding. Is there anything I can do to keep the encoding? I simply want to keep the character as it is, nothing else.

Here's the issue (as seen from Django's manage shell

>>> variable = "{'job_title': 'préventeur'}"
>>> variable
"{'job_title': 'pr\xc3\xa9venteur'}"

Why does Django or Python automatically change the string? Do I have to change the characterset or something?

Anything will help. Thanks!

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Marco A
  • 109
  • 1
  • 2
  • 10

3 Answers3

1

Your terminal is entering encoded characters; you are using UTF-8, and thus Python receives two bytes when you type é.

Decode from UTF-8 in that case:

>>> print 'pr\xc3\xa9venteur'.decode('utf8')
préventeur

You really want to read up on Python and Unicode though:

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
0
"{'job_title': 'pr\xc3\xa9venteur'}"

The characters have been encoded into UTF-8 for you, which is pretty nice, because you don't want to stick with Latin-1 if you value your sanity. Convert to Unicode for best results:

>>> '\xc3\xa9'.decode('UTF-8')
u'é'
Dietrich Epp
  • 205,541
  • 37
  • 345
  • 415
0

Have you tried using print statement instead?

>>> variable = "{'job_title': 'préventeur'}"

>>> variable
"{'job_title': 'pr\x82venteur'}"

>>> repr(variable)
'"{\'job_title\': \'pr\\x82venteur\'}"'

>>> print variable
{'job_title': 'préventeur'}
Guddu
  • 1,588
  • 4
  • 25
  • 53