0

I want to create a list of letters from this string in python:

string = 'utf✮fff'

I have tried this:

>>> string = "utf✮fff"
>>> print list(string)
['u', 't', 'f', '\xe2', '\x9c', '\xae', 'f', 'f', 'f']

However it should look like this:

['u', 't', 'f', '\xe2\x9c\xae', 'f', 'f', 'f']

Does anyone know how to create this output? Thank you in advance!

  • 4
    You're dealing with a byte string. Use a unicode string if you want Python to be aware of individual *characters*: `u'utf✮fff'` – deceze Dec 02 '16 at 12:44
  • Thank you! It worked, how can I apply this on string variables? like: `unicodestring = string.encode('unicode')` – Daniel Tremer Dec 02 '16 at 12:56
  • @DanielTremer "unicode" is not an encoding. You have to use var = unicode(var) – Pierre Barre Dec 02 '16 at 13:00
  • @DanielTremer: unicodestring = string.decode('utf8'). But don't actually call it string, that's the name of a standard library module. – RemcoGerlich Dec 02 '16 at 13:05

1 Answers1

1

You have to use unicode strings:

string = u'utf✮fff'
print list(string)

[u'u', u't', u'f', u'\u272e', u'f', u'f', u'f']


string = 'utf✮fff'
string = unicode(string)
print list(string)

[u'u', u't', u'f', u'\u272e', u'f', u'f', u'f']

Note that in your case, you will have to set the

# coding: utf8

header.

Pierre Barre
  • 2,174
  • 1
  • 11
  • 23
  • Works! Did it this way without setting the header: `string = 'utf✮fff' string = string.decode('utf-8') string = unicode(string) print list(string)` – Daniel Tremer Dec 02 '16 at 13:12
  • @DanielTremer That's valid too :) Note you can also use sys.setdefaultencoding('utf8') – Pierre Barre Dec 02 '16 at 13:14