Python imaplib - get_filename() not working when attachment has UTF-8 characters

Question

I have this function that downloads all attachments from a given email using imaplib

# Download all attachment files for a given email
def downloaAttachmentsInEmail(m, emailid, outputdir, markRead):
    resp, data = m.uid("FETCH", emailid, "(BODY.PEEK[])")
    email_body = data[0][1]
    mail = email.message_from_bytes(email_body)
    if mail.get_content_maintype() != 'multipart':
        return
    for part in mail.walk():
        if part.get_content_maintype() != 'multipart' and part.get('Content-Disposition') is not None:
            open(outputdir + '/' + part.get_filename(), 'wb').write(part.get_payload(decode=True)
    if(markRead):
        m.uid("STORE", emailid, "+FLAGS", "(\Seen)")

The problem is it doesn't work when I try to download a file that has UTF-8 characters in the file name. I get this error, which I guess is happening because part.get_filename() isn't reading the name correctly:

    OSError: [Errno 22] Invalid argument: './temp//=?UTF-8?B?QkQgUmVsYXTDs3JpbyAywqogRmFzZS5kb2M=?=\r\n\t=?UTF-8?B?eA==?='

What can I do to fix this?

score 4 · Accepted Answer · answered May 24 '20 at 12:15

I found a solution

# Download all attachment files for a given email
def downloaAttachmentsInEmail(m, emailid, outputdir, markRead):
    resp, data = m.uid("FETCH", emailid, "(BODY.PEEK[])")
    email_body = data[0][1]
    mail = email.message_from_bytes(email_body)
    if mail.get_content_maintype() != 'multipart':
        return
    for part in mail.walk():
        if part.get_content_maintype() != 'multipart' and part.get('Content-Disposition') is not None:
            filename, encoding = decode_header(part.get_filename())[0]
            if(encoding is None):
                open(outputdir + '/' + filename, 'wb').write(part.get_payload(decode=True))
            else:
                open(outputdir + '/' + filename.decode(encoding), 'wb').write(part.get_payload(decode=True))
    if(markRead):
        m.uid("STORE", emailid, "+FLAGS", "(\Seen)")**

Note the decode_header function is found in the email.header import. — matt123788, Jun 09 '21 at 15:28

Felipe Vegini · Answer 2 · 2020-09-09T11:34:15.653

It's a old question, but I was facing this problem and was hard to find the solution... maybe this can help other people!

Edit: This covers only the part to "decode" the filename to the correct one!

import re
import base64
import quopri

def encoded_words_to_text(encoded_words):
    try:
        encoded_word_regex = r'=\?{1}(.+)\?{1}([B|Q])\?{1}(.+)\?{1}='
        charset, encoding, encoded_text = re.match(encoded_word_regex, encoded_words).groups()
        if encoding is 'B':
            byte_string = base64.b64decode(encoded_text)
        elif encoding is 'Q':
            byte_string = quopri.decodestring(encoded_text)
        return byte_string.decode(charset)
    except:
        return encoded_words

Results:

test_string = '=?utf-8?B?SUJUIFB1cmNoYXNlIE9yZGVyLnBkZg==?='
encoded_words_to_text(test_string)
'IBT Purchase Order.pdf'

Can you add all imported modules? – Banana Sep 08 '20 at 15:00 — Banana, Sep 08 '20 at 15:00

Python imaplib - get_filename() not working when attachment has UTF-8 characters

2 Answers2