0

I'm reading the Python 3 docs here and I must be blind or something... Where does it say how to get the body of a message?

What I want to do is to open a message and perform some loop in text-based bodies of the message, skipping binary attachments. Pseudocode:

def read_all_bodies(local_email_file):
    email = Parser().parse(open(local_email_file, 'r'))
    for pseudo_body in email.pseudo_bodies:
        if pseudo_body.pseudo_is_binary():
            continue
        # Pseudo-parse the body here

How do I do that? Is even Message class correct class for this? Isn't it only for headers?

Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778

1 Answers1

1

This is best done using two functions:

  1. One to open the file. If the message is single-part, get_payload returns string in the message. If message is multipart, it returns list of sub-messages
  2. Second to handle the text/payload

This is how it can be done:

def parse_file_bodies(filename):
    # Opens file and parses email
    email = Parser().parse(open(filename, 'r'))
    # For multipart emails, all bodies will be handled in a loop
    if email.is_multipart():
        for msg in email.get_payload():
            parse_single_body(msg)
    else:
        # Single part message is passed diractly
        parse_single_body(email)

def parse_single_body(email):
    payload = email.get_payload(decode=True)
    # The payload is binary. It must be converted to
    # python string depending in input charset
    # Input charset may vary, based on message
    try:
        text = payload.decode("utf-8")
        # Now you can work with text as with any other string:
        ...
    except UnicodeDecodeError:
        print("Error: cannot parse message as UTF-8")
        return  
Tomáš Zato
  • 50,171
  • 52
  • 268
  • 778