Exchangelib Python extracting emails as HTML but I want plain text

Question

I am new to getting emails in Python except for using things for ArcGIS. However, I have been assigned a task to continuly look at an email address for an incoming email with a paticular subject and to extract just a few things from that email. I think I can do that pretty easily. However, I am using Exchangelib for Python and when I pull emails, and generate the text I get a whole bunch of HTML code with it. It's on all emails that I pull from Python. Is there a way to use something like BeautifulSoup to do this? If so how?

from exchangelib import DELEGATE, Account, Credentials
from bs4 import BeautifulSoup

credentials = Credentials(
    username='user.name@company.com', #Microsoft Office 365 requires you to use user.name@domain for username
    password='MyS3cretP@$$w0rd'          #Others requires DOMAIN\User.Name
)
account = Account(
    primary_smtp_address='primary.email@company.com',
    credentials=credentials,
    autodiscover=True,
    access_type=DELEGATE
)

# Print first <number selected> inbox messages in reverse order
for item in account.inbox.all().order_by('-datetime_received')[:1]:
    print(item.subject, item.body)

I am also attaching two images. One of what the email looks like, and the other what python is spitting out.

Again, what I want to learn how to do is to get that where what python is spitting out is to be plain text.

UPDATE: This was just a test email to show you all the HTML that is being generated with Exchangelib. Eventually, emails will look something like this

Outage Request Number:  1-001111
Outage Request Status:  Completed
Status Updated By:  Plant
Requested Equipment:     Hose
Planned Start:  Outage: 01/01/2000 01:00
Planned End:    Outage: 01/01/2000 02:00
Actual Start:   01/01/2000 01:00
Actual Completion:  01/01/2000 02:00
Duration:   Exactly 1.00 Hour(s)
Continuous
Outage Request Priority:    Forced
Request Updated:    01/01/2000 00:01

Python Output

Can you elaborate on what you are extracting? This is HTML so you should be able to use BeautifulSoup to parse text out of the message. — Kyle, Oct 13 '17 at 18:50
Sure. So right now I just have this set up for my personal company email address. Once i tinker with it, I will ad the creds for the email I will be listening to. But I will be getting emails periodically that will look something like this: Status Updated By: Requested Equipment: Planned Start: Planned End: Actual Start: Actual Completion: : Essentially, this is a ticketing system that I will be using for ArcGIS. However, we don't have API available to — user38508, Oct 13 '17 at 18:52
Are these emails coming from end users or is this getting programmatically generated somewhere else? I'm asking because different email clients will format this stuff very differently, thus making it tricky to search for a particular part with BS4. — Kyle, Oct 13 '17 at 18:56
tie into. So we are basically having to do this from ground up. I just need this HTML out of the way, because I will need to put that info into a spreadsheet at some point in time. So I know I'm going to have to parse something. — user38508, Oct 13 '17 at 18:56
Heck maybe just using a regular expressions would be the best way to do it. — Kyle, Oct 13 '17 at 18:58
Plant managers are sending this info through a system. Basically it's a form that is then generated by another system that will send the emails out. — user38508, Oct 13 '17 at 18:58
I would try to get a sample of that particular email from that other system and build your parser off of that. Sending yourself an email via Outlook is going to add a bunch of extra junk to the email. — Kyle, Oct 13 '17 at 19:00
This isn't looking at outlook though. This is pulling it from the exchange server. Much like what we will be doing with the other email address. Outlook is not involved at all on this one — user38508, Oct 13 '17 at 19:11

score 4 · Answer 1 · answered Oct 18 '17 at 07:24

exchangelib supports the text_body on some Exchange server versions. This is the server's attempt at cleaning up the HTML and presenting a text version of the email message. You may find it useful.

If not, the author just sent you an HTML email message, and you'll have to deal with that and extract the information you need. BeautifulSoup is perfect for that. Just parse the message body and start extracting:

item = my_account.inbox.get(subject='My special email')
soup = BeautifulSoup(item.body)
soup.find_all('p')

Exchangelib Python extracting emails as HTML but I want plain text

1 Answers1