Searching for UTF-8 encoded subjects with imaplib

Question

I have some working code to fetch mail bodies and I want to filter the subject with a non-ascii string. Other forums suggest using the .uid class to do so, but the behavior is not logic to me.

Current code:

import imaplib
import email

username = secret
password = secret

imap = imaplib.IMAP4_SSL("imap.gmail.com")

status, messages = imap.select("INBOX",readonly=True)

res, msg = imap.search(None, 'HEADER Subject "string to be encoded with UTF-8"')

Suggested code:

import imaplib
import email

username = secret
password = secret

imap = imaplib.IMAP4_SSL("imap.gmail.com")

status, messages = imap.select("INBOX",readonly=True)

imap.literal = u"string to be encoded with UTF-8".encode('utf-8')
res, msg = imap.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')

The suggested code works fine, but the returned array (msg[0]) contains indicies of the mailbox that are out of bounds. On the contrary when I use the .search class, valid indices are returned instead as long as I search for ASCII strings (both UTF-8 and non-UTF-8 encoded strings aren't accepted here). I don't quite understand the behaviour and logic of .uid because of this. I'd be grateful if someone can help me on the way.

How can I filter the subject with a UTF-8 string?

The only difference between 'search' and 'mid search' are whether it returns Message Sequence Numbers (MSNs, numbered 1-N, n being the number of mails in your mail box) or Unique Identifiers (UIDs, numbered increasing, numbers generally never reused, even if messages are deleted). UIDs are more stable, and it's recommended to use them for everything: `x.fetch(...)` -> `x.uid('FETCH', ... )`, `x.search(...) ` -> `x.uid('SEARCH', ...)`, etc. — Max, Sep 08 '20 at 18:03
I managed to solve the scenario with the following: `res, msg = imap.uid('search', None, 'HEADER Subject "Subject to filter"')` `messages = msg[0].decode('utf-8').split()` `for uid in messages: ` ` res, msg = imap.uid('fetch', uid, '(RFC822)') ` ` #parsing logic to follow ` Thanks for your effort and suggestion to use UID instead. It is stated on the imaplib pages that `.uid` is recommended. — Simon W, Sep 10 '20 at 10:10
Try to use high level lib: https://github.com/ikvk/imap_tools — Vladimir, Nov 02 '20 at 06:29
Thanks for the suggestion @Vladimir, I will give this a shot in my next project. — Simon W, Nov 03 '20 at 07:27

score 1 · Accepted Answer · answered Sep 10 '20 at 10:36

I managed to solve the scenario with the following, using the recommended way with .uid instead of .search:

imap = imaplib.IMAP4_SSL("server_to_connect_to")
imap.login(username, password)

status, messages = imap.select("INBOX",readonly=True)
imap.literal  = u'"Subject to be searched"'.encode('utf-8')
res, uid = imap.uid('SEARCH', 'CHARSET', 'UTF-8', 'SUBJECT')
messages = msg[0].decode('utf-8').split()

for uid in messages:
    res, msg = imap.uid('fetch', uid, '(RFC822)')
    #parsing logic

score 0 · Answer 2 · answered Aug 21 '22 at 01:26

Using the search you could use a feach in the uid

```    
    EMAIL = 'your_email'
    PASSWORD = 'your_password'
    
    imap = imaplib.IMAP4_SSL("imap.gmail.com")
    imap.login(EMAIL, PASSWORD)
    imap.select("INBOX", readonly=True)
    
    imap.literal = "Subjéct tô be searchéd".encode("utf-8")
    _, list_email_id = imap.search("utf-8", "SUBJECT")
    
    for email_id in list_email_id[0].split():
        _, data = imap.fetch(email_id, "(RFC822)")
        #email = MailMessage(data)
```

You can use imap_tools to instantiate data as imaplib result returns binary and need to use decoding. the MailMessage class of imap_tools does the handling and is easy to use

Searching for UTF-8 encoded subjects with imaplib

2 Answers2