5

I have the plain text of a Cc header field that looks like so:

friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>

Are there any battle tested modules for parsing this properly?

(bonus if it's in python! the email module just returns the raw text without any methods for splitting it, AFAIK) (also bonus if it splits name and address into to fields)

Acorn
  • 49,061
  • 27
  • 133
  • 172
smurthas
  • 428
  • 1
  • 5
  • 12

4 Answers4

17

There are a bunch of function available as a standard python module, but I think you're looking for email.utils.parseaddr() or email.utils.getaddresses()

>>> addresses = 'friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>'
>>> email.utils.getaddresses([addresses])
[('', 'friend@email.com'), ('John Smith', 'john.smith@email.com'), ('Smith, Jane', 'jane.smith@uconn.edu')]
Jonathan Berger
  • 1,043
  • 13
  • 17
Martin Tournoij
  • 26,737
  • 24
  • 105
  • 146
  • These modules work great, but they both require you to already have your addresses split into individual strings. – Acorn Mar 25 '11 at 02:11
  • `[ email.utils.parseaddr(a) for a in m.split(',') ]` Where `alist` is the address-list you posted above. – Martin Tournoij Mar 25 '11 at 02:17
  • @Carpetsmoker: The name for the 3rd address contains a comma, so that doesn't work. – Acorn Mar 25 '11 at 02:26
  • 1
    Yeah, I thought of that after I posted that. Did some further reading and careful re-reading of the ``email.utils.getaddresses()`` documentation revealed that you need to pass a **list**, not a **string**, do'h! So use ``email.utils.parseaddr([alist])`` – Martin Tournoij Mar 25 '11 at 02:29
  • Yes, therein lies the problem that the OP seeks a solution to. The fact that "the email module just returns the raw text without any methods for splitting it" :) – Acorn Mar 25 '11 at 02:32
  • I'm confused, split what into what? – Martin Tournoij Mar 25 '11 at 02:34
  • Splitting a string containing a list of addresses into a list containing address strings. – Acorn Mar 25 '11 at 03:20
  • 4
    Oh! `getaddresses()` doesn't need a list of strings! .. you can just pass it a list with a single string in it containing multiple addresses. I feel stupid now.. – Acorn Mar 25 '11 at 14:26
  • 3
    haha, you should feel smart for taking an interest and learning something new :-) My description above wasn't to clear I guess ... – Martin Tournoij Mar 25 '11 at 14:38
  • Almost too good to be true. Thanks [Jonathan Berger](https://stackoverflow.com/users/518222/jonathan-berger)! – Krishan Gupta Oct 26 '17 at 04:57
0

Convert multiple E-mail string in to dictionary (Multiple E-Mail with name in to one string).

emailstring = 'Friends <friend@email.com>, John Smith <john.smith@email.com>,"Smith" <jane.smith@uconn.edu>'

Split string by Comma

email_list = emailstring.split(',')

name is key and email is value and make dictionary.

email_dict = dict(map(lambda x: email.utils.parseaddr(x), email_list))

Result like this:

{'John Smith': 'john.smith@email.com', 'Friends': 'friend@email.com', 'Smith': 'jane.smith@uconn.edu'}

Note:

If there is same name with different email id then one record is skip.

'Friends <friend@email.com>, John Smith <john.smith@email.com>,"Smith" <jane.smith@uconn.edu>, Friends <friend_co@email.com>'

"Friends" is duplicate 2 time.

ase
  • 13,231
  • 4
  • 34
  • 46
  • 1
    The parser needs to handle addresses that have a comma in the name - e.g. "Smith, Jane", as that is a valid name for an email address. The emailstring.split(',') command will split up the "Smith, Jane" name into two separate addresses. – smurthas Jul 22 '15 at 17:40
0

I haven't used it myself, but it looks to me like you could use the csv package quite easily to parse the data.

Demian Brecht
  • 21,135
  • 5
  • 42
  • 46
0

The bellow is completely unnecessary. I wrote it before realising that you could pass getaddresses() a list containing a single string containing multiple addresses.

I haven't had a chance to look at the specifications for addresses in email headers, but based on the string you provided, this code should do the job splitting it into a list, making sure to ignore commas if they are within quotes (and therefore part of a name).

from email.utils import getaddresses

addrstring = ',friend@email.com, John Smith <john.smith@email.com>,"Smith, Jane" <jane.smith@uconn.edu>,'

def addrparser(addrstring):
    addrlist = ['']
    quoted = False

    # ignore comma at beginning or end
    addrstring = addrstring.strip(',')

    for char in addrstring:
        if char == '"':
            # toggle quoted mode
            quoted = not quoted
            addrlist[-1] += char
        # a comma outside of quotes means a new address
        elif char == ',' and not quoted:
            addrlist.append('')
        # anything else is the next letter of the current address
        else:
            addrlist[-1] += char

    return getaddresses(addrlist)

print addrparser(addrstring)

Gives:

[('', 'friend@email.com'), ('John Smith', 'john.smith@email.com'),
 ('Smith, Jane', 'jane.smith@uconn.edu')]

I'd be interested to see how other people would go about this problem!

Acorn
  • 49,061
  • 27
  • 133
  • 172