0

I am using the IMAPClient library in Python. I am able to download the attached document in the email. I am interested in only Excel files.

I am interested to extract the recipient list from the email. Any idea how to do it in Python ?

Here is the code snippet which might be useful

for ind_mail in emails:
    msg_string = ind_mail['RFC822'].decode("utf-8")
    #print(msg_string.decode("utf-8"))
    email_msg = email.message_from_string(msg_string)

    for part in email_msg.walk():

        # Download only Excel File
        filetype = part.get_content_type()

        if(filetype == 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'):
            #download
Kiran
  • 8,034
  • 36
  • 110
  • 176
  • Your question isn't entirely well-defined. The message headers can contain a smattering of `To:`, `Cc:`, and `Bcc:`, none of which are guaranteed to contain the actual recipients; if a message is resent, `Resent-To:`, `Resent-Cc:` etc override those, with the same caveat. The envelope recipient list is not stored anywhere, but something like `Delivered-To:` is added by many modern mail servers to identify which local recipient actually received a message. Could you elaborate on what information precisely you are hoping to extract? – tripleee Sep 15 '15 at 07:35

2 Answers2

0

The straightforward answer to your question is to get the corresponding headers' values, i.e.:

to_rcpt = email_msg.get_all('to', [])
cc_rcpt = email_msg.get_all('cc', [])

, inside that first loop. The MIME standard doesn't enforce uniqueness on the headers (though strongly suggests it), thus get_all; if not present, you'll still have an empty list for a consecutive loop.

But as tripleee has rightfully pointed out, the mime headers can be easily censored, spoofed or simply removed. Yet this is the only info persisted and returned by a server, and all mail clients use to present to us :)

Todor Minakov
  • 19,097
  • 3
  • 55
  • 60
0

Calling msg.get_all will return a list containing one entry per one header, so if you have multiple header, you'll get a list per header

BUT

If one header has multiple emails in a coma-separated way, you will only get one string and you'll have to split it.

The best way to have the list of all the emails from a specific header is to use getaddresses (https://docs.python.org/3/library/email.utils.html#email.utils.getaddresses)

from email.utils import getaddresses

to_rcpt = getaddresses(email_msg.get_all('to', []))

get_all will return an array of all the "To:" headers, and getaddresses will parse each entry and return as many emails as present on each headers. For instance:

message = """
To: "Bob" <email1@gmail.com>, "John" <email2@gmail.com>
To: email3@gmail.com, email4@gmail.com
"""

to_rcpt = getaddresses(email_msg.get_all('to', []))
=> [('Bob', 'email1@gmail.com'), ('John', 'email2@gmail.com'), ('', 'email3@gmail.com'), ('', 'email4@gmail.com')]
Cyril N.
  • 38,875
  • 36
  • 142
  • 243