1

I could use some help reading email with special characters in it and outputting that email with the characters.

So far I have witnessed email come in with the character sets: "UTF-8", "windows-1252", "ISO-8859-1"

I have seen

‘hey’ 

represented as

=91hey=92

and

‘’,“”=hey÷½Öñ♦→%@¥÷

represented as

=E2=80=98=E2=80=99,=E2=80=9C=E2=80=9D=3Dhey=C3=B7=C2=BD=C3=96=C3=B1=E2=99=A6=E2=86=92%@=C2=A5=C3=B7

(I removed two = signs above because of line breaks)

sometimes these seem to be hex representations because I can put them into binascii.unhexlify() and get the proper result back.

there were a few others where if I dropped the = and put \x and ran it through "myString".decode('iso-8859-1') I got the right thing.

I'm pretty confused, how do I decode the email text?

agf
  • 171,228
  • 44
  • 289
  • 238
  • 7
    You need to read the email headers to see how it's encoded. Your example looks like quoted-printable, and would be decoded with the `quopri` module in the standard library. – Wooble Aug 15 '11 at 15:02
  • 1
    this works for my second example which I believe to be Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable [link](http://docs.python.org/release/2.4/lib/module-quopri.html) and [link](http://effbot.org/librarybook/quopri.htm) helped. The first example I believe is Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable which seems to need different treatment – Jess Barter Aug 15 '11 at 15:22
  • 1
    this seems to do it (pastebin)[link](http://pastebin.com/D9hn68p5) Thanks a bunch! If this wasn't a comment I'd mark it as right as it seems to solve my issues for now. But it wouldn't be the first time I thought I was on the right track. thank you very much. this has helped a lot. – Jess Barter Aug 15 '11 at 15:40

0 Answers0