0

I have a question regarding saving email data in batches using exchangelib. Currently it is taking a lot of time if there are many emails. After a few minutes it throws this error:

    ERROR:    MemoryError:
    Retry: 0
    Waited: 10
    Timeout: 120
    Session: 25999
    Thread: 28148
    Auth type: <requests.auth.HTTPBasicAuth object at 0x1FBFF1F0>
    URL: https://outlook.office365.com/EWS/Exchange.asmx
    HTTP adapter: <requests.adapters.HTTPAdapter object at 0x1792CE68>
    Allow redirects: False
    Streaming: False
    Response time: 411.93799999996554
    Status code: 503
    Request headers: {'X-AnchorMailbox': 'myworkemail@workdomain.com'}
    Response headers: {}

Here is the code that I use for connecting and reading:

def connect_mail():
    config = Configuration(
        server="outlook.office365.com",
        credentials=Credentials(
            username="myworkemail@workdomain.com", password="*******"
        ),
    )
    return Account(
        primary_smtp_address="myworkemail@workdomain.com",
        config=config,
        access_type=DELEGATE,
    )

def import_email(account):
    tz = EWSTimeZone.localzone()
    start = EWSDateTime(2020, 10, 26, 22, 15, tzinfo=tz)
    for item in account.inbox.filter(
        datetime_received__gt=start, is_read=False
    ).order_by("-datetime_received"):
        email_body = item.body
        email_subject = item.subject
        soup = bs(email_body, "html.parser")
        tables = soup.find_all("table")
        item.is_read = True
        item.save()
        # Some code here for saving the email to a database
Erik Cederstrand
  • 9,643
  • 8
  • 39
  • 63
Chandan
  • 217
  • 1
  • 3
  • 17

1 Answers1

1

You're getting a MemoryError which means that Python is not able to allocate any more memory on your machine.

There's a couple of things you can do to reduce memory consumption of your script. One is to use .iterator() which disables internal caching of your query results. Another is to fetch only the fields you actually need using .only()

When you're using .only(), the other fields will be None. You need to remember to only save the one field you actually changed: item.save(update_fields=['is_read'])

Here's an example of how to use the two improvements:

for item in account.inbox.filter(
        datetime_received__gt=start, is_read=False,
    ).only(
        'is_read', 'subject', 'body',
    ).order_by('-datetime_received').iterator():
Erik Cederstrand
  • 9,643
  • 8
  • 39
  • 63
  • Thanks @erik-cederstrand , above code is working fine. However I still want to hear your thoughts on batch processing of emails. I want to save email data into mongo db through a cron job. above code might not work when the code pushed to server. that's the reason I want to read them in batches. – Chandan Oct 29 '20 at 09:30
  • 1
    I would suggest that you find or open a new question for that, since saving data to mongodb in batch doesn't have anything to do with exchangelib. – Erik Cederstrand Oct 30 '20 at 08:19