0

I tried to adapt this script I found by searching Google. Was working perfectly with the previous emails I was receiving, as it was directly extracting the "From" field, and I didn't get the error.

Here is what my code looks like :

#!/usr/bin/python

import imaplib
import sys
import email
import re

#FOLDER=sys.argv[1]
FOLDER='folder'
LOGIN='login@gmail.com'
PASSWORD='password'
IMAP_HOST = 'imap.gmail.com'  # Change this according to your provider

email_list = []
email_unique = []

mail = imaplib.IMAP4_SSL(IMAP_HOST)
mail.login(LOGIN, PASSWORD)
mail.select(FOLDER)

result, data = mail.search(None, 'ALL')
ids = data[0]
id_list = ids.split()
for i in id_list:
    typ, data = mail.fetch(i,'(RFC822)')
    for response_part in data:
        if isinstance(response_part, tuple):
            msg = email.message_from_string(response_part[1])
            sender = msg['reply-to'].split()[0]
            address = re.sub(r'[<>]','',sender)
# Ignore any occurences of own email address and add to list
    if not re.search(r'' + re.escape(LOGIN),address) and not address in email_list:
        email_list.append(address)
        print address
Vanity
  • 33
  • 2
  • 7
  • email may not have `reply-to`. It is normal. – furas Dec 06 '16 at 14:43
  • all the email does have reply-to. In fact, it's not extracting the email address, that's the only problem here – Vanity Dec 06 '16 at 14:48
  • if you sure that you email have `reply-to` then use `print()` to see what you have in `msg` and in `msg['reply-to']` and in `msg['reply-to'].split()` – furas Dec 06 '16 at 14:51
  • the result of msg['reply-to'] is what I'm looking for, but I only need what is contained between < and > (the email address in fact) – Vanity Dec 06 '16 at 14:59
  • and what do you have in `msg['reply-to'].split()` ? maybe you need `[-1]` instead of `[0]` – furas Dec 06 '16 at 15:24
  • Alright thanks, fixed everything. Simply needed to change that line : `sender = msg['reply-to'].split()[-1]` – Vanity Dec 06 '16 at 15:27

1 Answers1

4

Instead of messing around with string splitting and slicing, the correct approach is to use parseaddr from the email.utils package in the standard library. It correctly handles the various legal address formats in email headers.

Some examples:

>>> from email.utils import parseaddr
>>> parseaddr("sally@foo.com")
('', 'sally@foo.com')
>>> parseaddr("<sally@foo.com>")
('', 'sally@foo.com')
>>> parseaddr("Sally <sally@foo.com>")
('Sally', 'sally@foo.com')
>>> parseaddr("Sally Smith <sally@foo.com>")
('Sally Smith', 'sally@foo.com')
>>> 

Also, you shouldn't assume that emails have a Reply-To header. Many do not.

Menno Smits
  • 2,074
  • 1
  • 13
  • 12