How to get decode attachment filename with python email?

Question

I use the following code to extract filename of the attachment:

import email.utils

msg = email.message_from_string(self.request.body) # http://docs.python.org/2/library/email.parser.html
for part in msg.walk():
    ctype = part.get_content_type()
    if ctype in ['image/jpeg', 'image/png']:
        image_file = part.get_payload(decode=True)
        image_file_name = part.get_filename()

It works well in many cases, but sometime as image_file_name I get values like =?KOI8-R?B?xsHTLTk2Mi5qcGc=?= or =?UTF-8?B?REkyeTFXMFNMNzAuanBn?=.

How should I handle such cases?

score 16 · Answer 1 · edited Sep 18 '18 at 14:12

You can use decode_header function like this:

from email.header import decode_header

filename = part.get_filename()
if decode_header(filename)[0][1] is not None:
    filename = str(decode_header(filename)[0][0]).decode(decode_header(filename)[0][1])

With Python 3:

from email.message import EmailMessage
from email.header import decode_header


def get_part_filename(msg: EmailMessage):
    filename = msg.get_filename()
    if decode_header(filename)[0][1] is not None:
        filename = decode_header(filename)[0][0].decode(decode_header(filename)[0][1])
    return filename

score 3 · Accepted Answer · answered Feb 11 '14 at 20:12

3

You should look at the three parts separated by '?', and use the first two as instructions for how to treat the third:

The first bit is the character-encoding (KO18-R and UTF-8 in your examples), and the second bit is a 'B' to indicate base64 encoding - Q in it's place would indicate quoted-printable, so you should prepare your code for that as well.

answered Feb 11 '14 at 20:12

Greg

10,350
1
26
35

Thanks. Does it mean that I have to decode it manually and email.utils does not have ready method to be used? – LA_ Feb 12 '14 at 06:14
OK, I've use the following code - `if image_file_name[:11] == '=?KOI8-R?B?': image_file_name = base64.b64decode(image_file_name[11:]).decode('KOI8-R') if image_file_name[:10] == '=?UTF-8?B?': image_file_name = base64.b64decode(image_file_name[10:])` – LA_ Feb 15 '14 at 14:37

score 2 · Answer 3 · answered Nov 03 '17 at 21:16

2

Elaborating on @Nikon's response:

from email.header import decode_header

filename = part.get_filename()
fname, charset = decode_header(filename)
if charset:
    filename = fname.decode(charset)

answered Nov 03 '17 at 21:16

Carlo Pires

4,606
7
32
32

1

In the line where you use `decode_header`, I had to specify `[0]`. Otherwise, I got error `ValueError: need more than 1 value to unpack`. So that line should be `fname, charset = decode_header(filename)[0]`. – backpackerhh Jul 23 '18 at 13:25
This is the best one solution as for me!! – Vasyl Kolomiets Aug 02 '21 at 17:39

How to get decode attachment filename with python email?

3 Answers3

Linked