0

On Windows OS using Python 2.7 and Gmail - trying to fetch and read email's body.

# Parse the email message
msg = email.message_from_string(msg_data[0][1].decode('UTF-8'))

# Extract the "FROM" field
from_field = msg['FROM']

# Extract the received timestamp
received_timestamp = msg['Date']

msg_body = None
# Extract the body of the email
if msg.is_multipart():
    for part in msg.walk():
        # if part.get_content_type() == 'text/html':
        if part.get_content_type() == 'text/plain':
            msg_body = part.get_payload(decode=True).decode()
            break
else:
    msg_body = msg.get_payload(decode=True).decode()


# Print the results
print('FROM: {}'.format(from_field))
print('Received: {}'.format(received_timestamp))
print('Body:\n{}'.format(msg_body))

I am able to print "from_field" and "received_timestamp" variables - so I know its hitting the right email/message id.

I tried "text/html" - this gives me nice HTML format of the email (but I want it in plain text) and I tried "text/plain" this gives me "None".

I guess I don't have to use "beautifulsoup" or "re" to do fetch plain text.

I searched this forum and attempted to understand existing posts, but was not able to figure out what is missing in my code.

Any hints or pointers would be helpful - Thanks in advance!

Aksen P
  • 4,564
  • 3
  • 14
  • 27

1 Answers1

0

Try using msg.get_payload() instead of msg.get_payload(decode=True).decode(). The get_payload() method should return the plain text content without requiring additional decoding.

If that doesn't work but text/html is giving you the html, then maybe you can use python's built-in html library to extract that. Something like

html_body = part.get_payload(decode=True).decode()
            msg_body = html.unescape(html_body).replace('\r', '').replace('\n', ' ')

should work.

ingenium21
  • 55
  • 1
  • 10