0

I want to fetch multiple mails message, by theirs ids from Gmail and I'm using Imap for that. It worked great until I tried to speed things up with ThreadPull executor (even for 2 executors).

if anyone can think about what is the reason for that problem it will be very helpful.

connector.py

 def create_data_models(self, msgs_ids, max_workers=2):
    data_models = []
    #msgs_id_list = [message['id'] for message in messages]
    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
        futures = {executor.submit(self.imapService.fetch_message, msg_id) : msg_id for msg_id in msgs_ids}
        for future in concurrent.futures.as_completed(futures):
            try:
                data = future.result()
                data_model = self._create_msg_obj1(data)
                data_models.append(data_model)
            except Exception as exc:
                print('generated an exception: %s' % (exc))
    
    return data_models

imap_service.py

    def fetch_message(self, msg_id):
        _, data = self.mail.uid('fetch', msg_id, '(RFC822)')
        _, mail_data = data[0]
        logger.error('Succesfully fetch message from {} with {} id'.format(self.email_address, msg_id))
        print('Succesfully fetch message from {} with {} id'.format(self.email_address, msg_id))
        return mail_data

The errors:

generated an exception: command: UID => unexpected response: b'v5oJ8jIW2KPDzS/DHMN+58yL3Q'
generated an exception: command: UID => unexpected response: b'X-Google-Smtp-Source: APXvYqwtp0c9H1jJ8uZTRW5ZBjIyNbqRgeoGUq8heUUf00uNRHW9Nvyl1dTjYZAy6ijzL29+PTL7'
generated an exception: command: UID => unexpected response: b'X-Received: by 2002:a50:9223:: with SMTP id i32mr19845429eda.40.1566127216934;'
generated an exception: command: UID => unexpected response: b'        Sun, 18 Aug 2019 04:20:16 -0700 (PDT)'
generated an exception: command: UID => unexpected response: b'le.com; s=arc-20160816;'
generated an exception: command: UID => unexpected response: b'        h=message-id:to:sender:from:reply-to:subject:date:mime-version'
Sprint21
  • 89
  • 6
  • 1
    You cannot share a connection between multiple threads: they are not thread safe. You will need multiple connections. However, to reduce latency, you should fetch multiple messages per command by sending comma separated message ids. This will gain you a major speed up as most messages are limited by latency, not bandwidth. – Max Aug 16 '20 at 15:15
  • Thank you, it really helped me. Now my problem is that I cant get the body of the message, separated to plain text and HTML text. Do you have any idea how can I do that? @Max – Sprint21 Aug 16 '20 at 19:22
  • 1
    Well, start by using email.message_from_bytes to parse it into the MIME tree. You can walk the tree... If you use the new style parsers it should be able to guess what the body/ies are – Max Aug 17 '20 at 00:30
  • I will try, Thank you @Max – Sprint21 Aug 18 '20 at 12:41

1 Answers1

1

Use this

import imaplib
import email

host = input('Your Hostname:  ')
username =  input('Your Username:  ')
password =  input('Your Password:  ')


def get_inbox():
    mail = imaplib.IMAP4_SSL(host)
    mail.login(username, password)
    mail.select("inbox")
    _, search_data = mail.search(None, 'SEEN')
    my_message = []
    for num in search_data[0].split():
        email_data = {}
        _, data = mail.fetch(num, '(RFC822)')
        # print(data[0])
        _, b = data[0]
        email_message = email.message_from_bytes(b)
        for header in ['subject', 'to', 'from', 'date']:
            print("{}: {}".format(header, email_message[header]))
            email_data[header] = email_message[header]
        for part in email_message.walk():
            if part.get_content_type() == "text/plain":
                body = part.get_payload(decode=True)
                email_data['body'] = body.decode()
            elif part.get_content_type() == "text/html":
                html_body = part.get_payload(decode=True)
                email_data['html_body'] = html_body.decode()
        my_message.append(email_data)
    return my_message


if __name__ == "__main__":
    my_inbox = get_inbox()
    print(my_inbox)
# print(search_data)
  • Thank you, but I don't see how this helps me with the concurrency part. As I say, without the concurrency the whing work for me, but I want to do that with concurrency – Sprint21 Aug 15 '20 at 12:16