Get/Read email message and output plain text

Question

On Windows OS using Python 2.7 and Gmail - trying to fetch and read email's body.

# Parse the email message
msg = email.message_from_string(msg_data[0][1].decode('UTF-8'))

# Extract the "FROM" field
from_field = msg['FROM']

# Extract the received timestamp
received_timestamp = msg['Date']

msg_body = None
# Extract the body of the email
if msg.is_multipart():
    for part in msg.walk():
        # if part.get_content_type() == 'text/html':
        if part.get_content_type() == 'text/plain':
            msg_body = part.get_payload(decode=True).decode()
            break
else:
    msg_body = msg.get_payload(decode=True).decode()


# Print the results
print('FROM: {}'.format(from_field))
print('Received: {}'.format(received_timestamp))
print('Body:\n{}'.format(msg_body))

I am able to print "from_field" and "received_timestamp" variables - so I know its hitting the right email/message id.

I tried "text/html" - this gives me nice HTML format of the email (but I want it in plain text) and I tried "text/plain" this gives me "None".

I guess I don't have to use "beautifulsoup" or "re" to do fetch plain text.

I searched this forum and attempted to understand existing posts, but was not able to figure out what is missing in my code.

Any hints or pointers would be helpful - Thanks in advance!

Well, if it's in HTML format, you would just need to convert to plain text, right? — Rojo, Apr 09 '23 at 01:57
Or how would [`as_string()`](https://docs.python.org/3/library/email.compat32-message.html#email.message.Message.as_string) work? — Rojo, Apr 09 '23 at 02:01

score 0 · Answer 1 · answered Apr 09 '23 at 03:23

Try using msg.get_payload() instead of msg.get_payload(decode=True).decode(). The get_payload() method should return the plain text content without requiring additional decoding.

If that doesn't work but text/html is giving you the html, then maybe you can use python's built-in html library to extract that. Something like

html_body = part.get_payload(decode=True).decode()
            msg_body = html.unescape(html_body).replace('\r', '').replace('\n', ' ')

should work.

Get/Read email message and output plain text

1 Answers1