0

I am currently trying to figure out how to parse all the msg files I have stored in a specific folder and then save the body text to a dataframe but when I'm trying to extract the body of the emaill it is also extracting the emails that are attached to it. I want to extract only the body of the first email that is present in the msg file.

#src-code:https://stackoverflow.com/questions/52608069/parsing-multiple-msg-files-and-storing-the-body-text-in-a-csv-file
#reading multiple .msg files using python
from pathlib import Path
import win32com.client

outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")

# Assuming \Documents\Email Reader is the directory containg files
for p in Path(r'C:\Users\XY\Documents\Email Reader').iterdir():
    if p.is_file() and p.suffix == '.msg':
        msg = outlook.OpenSharedItem(p)
        print(msg.Body)

2 Answers2

0

I had a similar requirement. Full code is here: https://medium.com/@theamazingexposure/accessing-shared-mailbox-using-exchangelib-python-f020e71a96ab

For you purpose I think this snippet is going to work. It reads the first message with a specific subject line:

from exchangelib import Credentials, Account, FileAttachment

credentials = Credentials('First_Name.Last_Name@some_domain.com', 'Your_Password_Here')
account = Account('First_Name.Last_Name@some_domain.com', credentials=credentials, autodiscover=True)
filtered_items = account.inbox.filter(subject__contains='Your Search String Here')
print("Getting latest email from Given Search String...")
for item in account.inbox.filter(subject__contains='Your Search String Here').order_by('-datetime_received')[:1]:
    print(item.subject, item.text_body.encode('UTF-8'), item.sender, item.datetime_received) #body of email is extracted using:: item.text_body.encode('UTF-8')
0
from exchangelib import Credentials, Account, FileAttachment

credentials = Credentials('First_Name.Last_Name@some_domain.com','Your_Password_Here')
account = Account('First_Name.Last_Name@some_domain.com', credentials=credentials, autodiscover=True)
# get text body of the latest unread mail
mail_body = account.inbox.filter(is_read=False).order_by('-datetime_received')[0].text_body
Erik Cederstrand
  • 9,643
  • 8
  • 39
  • 63
ShinNShirley
  • 368
  • 2
  • 17