Parsing Multipart emails in python and saving attachments

Question

I am pretty new to python and I am trying to parse email from gmail via python's imaplib and email. It is working pretty well but I am having issues with email attachments.

I would like to parse out all of the plaintext from the email while ignoring any HTML that may be inserted as a secondary content type while also removing and saving all other attachments. I have been trying the following:

...imaplib connection and mailbox selection...

typ, msg_data = c.fetch(num, '(RFC822)')
        email_body = msg_data[0][1]
mail = email.message_from_string(email_body)
        for part in mail.walk():
            if part.get_content_type() == 'text/plain':
                body = body + '\n' + part.get_payload()
            else:
                continue

This was my original attempt to just take the plaintext portions of an email, but when someone sends an email with a text attachment, the contents of the text file shows up for the 'body' variable above.

Can someone tell me how I can extract the plaintext portions of an email while ignoring the secondary HTML that is sometimes present, while also saving all other types of file attachments as files? I appologize if this doesn't make a lot of sense. I will update the question with more clarification if needed.

I'm sorry I don't know much about the options available in python but is it possible to check for the content disposition rather than content type. I think this will show you if the content is inline or an attachment? I'd quite like to see any responses to this question. We're quite often asked what the best way to parse email is for use with http://cloudmailin.com. Although we pass the plain text quite often users will want to work with the full email. — Steve Smith, Jun 07 '11 at 08:55

score 11 · Accepted Answer · answered Jun 07 '11 at 19:07

If you just need to keep text attachments out of the body variable with what you have there, it should be as simple as this:

mail = email.message_from_string(email_body)
    for part in mail.walk():
        c_type = part.get_content_type()
        c_disp = part.get('Content-Disposition')

        if c_type == 'text/plain' and c_disp == None:
            body = body + '\n' + part.get_payload()
        else:
            continue

Then if the Content-Disposition indicates that it's an attachment, you should be able to use part.get_filename() and part.get_payload() to handle the file. I don't know if any of this can vary, but it's basically what I've used in the past to interface with my mail server.

Parsing Multipart emails in python and saving attachments

1 Answers1