0

The question is twofold, about getting MESSAGE-ID, and using imap_tools. For an email client ("handmade") in Python I need to lessen the data amount read from the server (presently it takes 2 min to read the whole mbox folder of ~170 msg for yahoo), I believe that having MESSAGE-ID will help me.

imap_tools has IDLE command which is essential to keep the yahoo server connection alive and other features which I believe will simplify the code.

To learn about MESSAGE-ID I started with the following code (file fetch_ssl.py):

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import imaplib
import email
import os
import ssl
import conf
# Why UID==1 has no MESSAGE-ID ?
if __name__ == '__main__':
    args = conf.parser.parse_args()
    host, port, env_var = conf.config[args.host]
    if 0 < args.verbose:
        print(host, port, env_var)
    with imaplib.IMAP4_SSL(host, port,
                           ssl_context=ssl.create_default_context()) as mbox:
        user, pass_ = os.getenv('USER_NAME_EMAIL'), os.getenv(env_var)
        mbox.login(user, pass_)
        mbox.select()
        typ, data = mbox.search(None, 'ALL')
        for num in data[0].split():
            typ, data = mbox.fetch(num, '(RFC822)')
            msg = email.message_from_bytes(data[0][1])
            print(f'num={int(num)}, MESSAGE-ID={msg["MESSAGE-ID"]}')
            ans = input('Continue[Y/n]? ')
            if ans.upper() in ('', 'Y'):
                continue
            else:
                break

Where conf.py is:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import argparse

HOST = 'imap.mail.yahoo.com'
PORT = 993
config = {'gmail': ('imap.gmail.com', PORT, 'GMAIL_APP_PWD'),
          'yahoo': ('imap.mail.yahoo.com', PORT, 'YAHOO_APP_PWD')}
parser = argparse.ArgumentParser(description="""\
Fetch MESSAGE-ID from imap server""")
parser.add_argument('host', choices=config)
parser.add_argument('-verbose', '-v', action='count', default=0)

fetch_ssl.py outputs:

$ python fetch_ssl.py yahoo
num=1, MESSAGE-ID=None
Continue[Y/n]? 
num=2, MESSAGE-ID=<83895140.288751@communications.yahoo.com>
Continue[Y/n]? n

I'd like to understand why the message with UID == 1 has no MESSAGE-ID? Does that happen from time to time (I mean there are messages with no MESSAGE-ID)? How to handle these cases? I haven't found such cases for gmail.

Then I attempted to do similar with imap_tools (Version: 0.56.0), (file fetch_tools.py):

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
import ssl
from imap_tools import MailBoxTls
import conf

# https://github.com/ikvk/imap_tools/blob/master/examples/tls.py
# advices
# ctx.load_cert_chain(certfile="./one.crt", keyfile="./one.key")
if __name__ == '__main__':
    args = conf.parser.parse_args()
    host, port, env_var = conf.config[args.host]
    if 0 < args.verbose:
        print(host, port, env_var)
    user, pass_ = os.getenv('USER_NAME_EMAIL'), os.getenv(env_var)
    ctx = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH)
    ctx.options &= ~ssl.OP_NO_SSLv3
    # imaplib.abort: socket error: EOF
    with MailBoxTls(host=host, port=port, ssl_context=ctx) as mbox:
        mbox.login(user, pass_, 'INBOX')
        for msg in mbox.fetch():
            print(msg.subject, msg.date_str)

Command

$python fetch_tools.py yahoo

outputs:

Traceback (most recent call last):
  File "/home/vlz/Documents/python-scripts/programming_python/Internet/Email/ymail/imap_tools_lab/fetch_tools.py", line 20, in <module>
    with MailBoxTls(host=host, port=port, ssl_context=ctx) as mbox:
  File "/home/vlz/Documents/.venv39/lib/python3.9/site-packages/imap_tools/mailbox.py", line 322, in __init__
    super().__init__()
  File "/home/vlz/Documents/.venv39/lib/python3.9/site-packages/imap_tools/mailbox.py", line 35, in __init__
    self.client = self._get_mailbox_client()
  File "/home/vlz/Documents/.venv39/lib/python3.9/site-packages/imap_tools/mailbox.py", line 328, in _get_mailbox_client
    client = imaplib.IMAP4(self._host, self._port, self._timeout)  # noqa
  File "/usr/lib/python3.9/imaplib.py", line 205, in __init__
    self._connect()
  File "/usr/lib/python3.9/imaplib.py", line 247, in _connect
    self.welcome = self._get_response()
  File "/usr/lib/python3.9/imaplib.py", line 1075, in _get_response
    resp = self._get_line()
  File "/usr/lib/python3.9/imaplib.py", line 1185, in _get_line
    raise self.abort('socket error: EOF')
imaplib.abort: socket error: EOF

Command

$ python fetch_tools.py gmail

Produces identical results. What are my mistakes?

Using Python 3.9.2, Debian GNU/Linux 11 (bullseye), imap_tools (Version: 0.56.0)

EDIT

Headers from the message with no MESSAGE-ID

X-Apparently-To: vladimir.zolotykh@yahoo.com; Sun, 25 Oct 2015 20:54:21 +0000
Return-Path: <mail@product.communications.yahoo.com>
Received-SPF: fail (domain of product.communications.yahoo.com does not designate 216.39.62.96 as permitted sender)
...
X-Originating-IP: [216.39.62.96]
Authentication-Results: mta1029.mail.bf1.yahoo.com  from=product.communications.yahoo.com; domainkeys=neutral (no sig);  from=product.communications.yahoo.com; dkim=pass (ok)
Received: from 127.0.0.1  (EHLO n3-vm4.bullet.mail.gq1.yahoo.com) (216.39.62.96)
  by mta1029.mail.bf1.yahoo.com with SMTPS; Sun, 25 Oct 2015 20:54:21 +0000
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=product.communications.yahoo.com; s=201402-std-mrk-prd; t=1445806460; bh=5PTgF8Jghm92xeMD5mSHp6A3eRVV70PWo1oQ15K7Tfk=; h=Date:From:Reply-To:To:Subject:From:Subject; b=D7ItgOiuLbiexJGHvORgbpRi22X+sYso6gwZKDXVca79DxMMy2R1dUtZTIg7tcft1lovVJUDw/7fC51orDltRidlfnpayeY8lT+94DRlSBwopuxgOqqR9oTTjTBZ0oEvdxUcXl/q54N2GxuBFvmg8UO0OZoCnFPpUVYo9x4arMjt/0TOW1Q5d/yjdmO7iwiued/rliP/Bsq0TaZYcb0oCAT7Q50tb1fB7wcXLYNSC1OCQ1l1LajbUqmU1LWWNse36mUUTBieO2sZT0ERFrHaCTaTNQSXKQG2AxYF7Dd/8i0Iq3xqdcS0bDpjmWE25uoKvCdtXtUbylsuQSChuLFMTw==
Received: from [216.39.60.185] by n3.bullet.mail.gq1.yahoo.com with NNFMP; 25 Oct 2015 20:54:20 -0000
Received: from [98.137.101.84] by t1.bullet.mail.gq1.yahoo.com with NNFMP; 25 Oct 2015 20:54:20 -0000
Date: 25 Oct 2015 20:54:20 +0000
Received: from [127.0.0.1] by nu-repl01.direct.gq1.yahoo.com with NNFMP; 25 Oct 2015 20:54:20 -0000
X-yahoo-newman-expires: 1445810060
From: "Yahoo Mail" <mail@product.communications.yahoo.com>
Reply-To: replies@communications.yahoo.com
To: <ME>@yahoo.com
Subject: Welcome to Yahoo! Vladimir
X-Yahoo-Newman-Property: ydirect
Content-Type: text/html
Content-Length: 25180

I skipped only X-YMailISG.

EDIT II

Of 167 messages 21 have no MESSAGE-ID header.
fetch_ssl.py takes 4m12.342s, and fetch_tools.py -- 3m41.965s

  • Can you print out your whole message that doesn’t have a Message-ID? It’s unusual but not impossible. For your second question, it looks like it’s not using TLS at the start and trying to connect plain text, which won’t work with these services. From the docs it looks like MailboxTls is for STARTTLS which is not supported by these servers. mailbox is used for regular SSL/TLS connections. – Max Jul 11 '22 at 18:25
  • The message in question is in html and huge (393 lines). I'm afraid stackoverflow.com would not like it. I have no idea how to condense it or which parts of it is essential for the problem. It I got you right, yahoo doesn't like STARTTLS and imap_tools won't go without, To me it means that I shall consider other options, e.g. imaplib2 ? – Vladimir Zolotykh Jul 11 '22 at 19:23
  • 1
    Just post the headers. Use imap_tools.Mailbox not MailboxTLS. MailBox uses TLS from the beginning of the conneciton, and MailBoxTLS uses STARTTLS (a seldom used variant where you connect plaintext, then use a STARTTLS protocol command to 'upgrade'). This is opposed to MailboxUnencrypted which does neither. – Max Jul 11 '22 at 19:32
  • I added headers to the OP – Vladimir Zolotykh Jul 11 '22 at 19:46
  • 1
    It looks simply like the 'welcome email' is slightly not-to-spec; It appears to be automatically generated, and perhaps their template doesn't include a message-id. You'll just have to cope with it missing. – Max Jul 11 '22 at 20:07

1 Answers1

2

It looks simply like your email without a Message-ID legitimately does not have one; it appears the welcome email Yahoo sent you actually lacks it. Since it's a system generated email, that's not that unexpected. You'd just have to skip over it.

The second problem is that you need to use imap_tools.MailBox. Looking at the documentation and source at the repo it appears that the relevant classes to use are:

  • MailBox - for a normal encrypted connection. This is what most email servers use these days, aka IMAPS (imap with SSL/TLS)
  • MailBoxTls - For a STARTTLS connection: this creates a plaintext connection then upgrades it later by using a STARTTLS command in the protocol. The internet has mostly gone to the "always encrypted" rather than "upgrade" paradigm, so this is not the class to use.
  • MailBoxUnencrypted - Standard IMAP without SSL/TLS. You should not use this on the public internet.

The naming is a bit confusing. MailBox corresponds to imaplib.IMAP4_SSL; MailBoxTls corresponds to imaplib.IMAP4, then using startls() on the resulting connection; and MailboxUnencrypted corresponds to imaplib.IMAP4 with no security applied. I imagine it's this way so the most common one (Mailbox) is a safe default.

Max
  • 10,701
  • 2
  • 24
  • 48