0

I'm using Python 2.7, and I am trying to properly decode the subject header line of an email. The source of the email is:

Subject: =?UTF-8?B?VGkgw6ggcGlhY2l1dGEgbGEgZGVtbz8gU2NvcHJpIGFsdHJlIG4=?=

I use the function decode_header(header) from the email.header library, and the result is:

[('Ti \xc3\xa8 piaciuta la demo? Scopri altre n', 'utf-8')]

The 'xc3\xa8' part should match the 'è' character, but it is not correctly decoded/showed. Another example:

Subject: =?iso-8859-1?Q?niccol=F2_cop?= =?iso-8859-1?Q?ernico?=

Result:

[('niccol\xf2 copernico', 'iso-8859-1')]

How can I obtain the correct string?

Labo29
  • 117
  • 3
  • 13

1 Answers1

1

You are getting the correct string. It's just encoded (using UTF-8 in the first case, and iso-8895-1 in the second); you need to decode it to get the actual unicode string.

For example:

>>> print unicode('Ti \xc3\xa8 piaciuta la demo? Scopri altre n', 'utf-8')
Ti è piaciuta la demo? Scopri altre n

Or:

>>> print unicode('niccol\xf2 copernico', 'iso-8859-1')
niccolò copernico

That's why you get back both the header data and the encoding.

larsks
  • 277,717
  • 41
  • 399
  • 399