Due to some bug in a C extension, I'm getting unicode data with str instances, or in order words, str with no encoding at all and an unicode literal.
So, for instance, this is a valid unicode literal
>>> u'\xa1Se educado!'
And the UTF-8 encoded str would be:
>>> '\xc2\xa1Se educado!'
However, I get an str with the unicode literal:
>>> '\xa1Se educado!'
And I need to create an unicode instance from that. Using unicode()
doesn't work, since it expects an encoding. I figured that ''.join(unichr(ord(x)) for x in s)
does what I need, but it's really ugly. There has to be a better solution. Any ideas?