1

I am using exchangelib to fetch emails. My function retrieves nearly 150000 emails(output is of type query set) in a second. I have to convert query set into JSON array for processing it further. Right now it is taking about an hour to convert QuerySet obj into JSON array, I want this conversion to happen in seconds though.

Sample code :

MailsArray = []
query_filter = Q(sender=xyz@abc.com)
timeLimit = UTC_NOW() - timedelta(hours=1)

# This step returns data in seconds
Inbox_mails = account.inbox.all().filter(query_filter,datetime_received_gt=timeLimit).only('subject','sender','conversation_id')

# This step takes a lot of time
for x in Inbox_mails:
    MailsArray.append( {"Subject":x.subject,"ID":x.conversation_id.id})

Any ideas on converting QuerySet data into JSON array fastly would be appreciated

Jeronimo
  • 2,268
  • 2
  • 13
  • 28
gow_r
  • 25
  • 5
  • I don't know exchangelib, but it looks like your `Inbox_mails` is only a query definition object. Which means the actual query is performed when you exhaust it, like by iterating over it. The simple creation of 150.000 dicts in Python shouldn't take very long. – Jeronimo Jun 25 '21 at 11:20

1 Answers1

0

As Jeronimo hinted, the creation of Inbox_mails doesn't actually fetch the emails. It's just a queryset definition, and the actual fetching of items happens when you iterate over the Inbox_mails object. Your problem is not about the conversion to JSON, but rather fetching the data.

150.000 emails is a lot, and it's probably going to be slow no matter how you do this. But you can try changing the paging size used to fetch items. See https://ecederstrand.github.io/exchangelib/#paging how to do that.

Erik Cederstrand
  • 9,643
  • 8
  • 39
  • 63
  • 1
    I retrieved only the necessary fields. since the fields were of very small memory size(may be) it made the iteration very fast. I got the entire 300000 entries iterated within 1-2 seconds. – gow_r Sep 22 '22 at 12:31