8

I'm accessing GMail's IMAP interface through python. I run a command like this:

UID SEARCH HEADER Message-ID "abcdef@abc.com"

That succeeds (returns 1 UID of the matching message, or 0 if it doesn't exist). However, if the search-text contains certain chars (like & or !), the search-text is truncated at that point. This means:

UID SEARCH HEADER Message-ID "!abcdef@abc.com"

Is treated the same as

UID SEARCH HEADER Message-ID ""

Also:

UID SEARCH HEADER Message-ID "abc!def@abc.com"

Is treated as:

UID SEARCH HEADER Message-ID "abc"

I've gone through the IMAP language spec, and from the ABNF language spec it seems like those chars should be valid. Why is gmail truncating these search phrases at the "!" and "&" chars? Is there a way to escape them? (I've tried !, fails as a badly-encoded string). Is there an RFC or doc that shows what really should be accepted? Is this a bug in gmail's imap implementation?

I've also tried literal format, same results:

UID SEARCH HEADER Message-ID {15}
abc!def@abc.com

Still treated as:

UID SEARCH HEADER Message-ID {3}
abc

Thanks!

IMAP RFC3501 Search Command: https://www.rfc-editor.org/rfc/rfc3501#section-6.4.4 Formal syntax: https://www.rfc-editor.org/rfc/rfc3501#section-9

Community
  • 1
  • 1
rocketmonkeys
  • 5,473
  • 5
  • 28
  • 21
  • I can confirm that there is nothing special about using an exclamation mark in the search query. It is most likely that you found a bug in gmail. I suggest using several different IMAP servers during development, in particular since gmail's IMAP implementation is not well known for its conformity to the IMAP specification. – nosid Apr 01 '12 at 19:24
  • Thanks nosid. Unfortunately, the IMAP server I need to use with this code is gmail, so testing on others won't help with this bug. But it is good to know that I'm not reading the spec wrong. I'll try to find a way to report this bug to google. – rocketmonkeys Apr 03 '12 at 20:57
  • Yes, I currently experience this problem when doing an IMAP search on Gmail through *alpine* mail client trying to select all messages with subjects containing `!`. – imz -- Ivan Zakharyaschev Sep 13 '16 at 15:33
  • I'd also ask: **How to overcome this bug in GMail and do such searches?** – imz -- Ivan Zakharyaschev Sep 13 '16 at 15:34
  • See also a discussion of this problem at https://groups.google.com/forum/#!topic/google-mail-xoauth-tools/fq1UZ44C8Yo . – imz -- Ivan Zakharyaschev Sep 13 '16 at 15:50
  • 1
    Google's IMAP search breaks things up into "words", which is probably why special characters get treated strangely. I echo the recommendation in the groups above: try using X-GM-RAW and sending google search keywords. – Max Sep 13 '16 at 16:04
  • @Max Thanks for echoing this useful recommendation! `X-GM-RAW` extension is documented at https://developers.google.com/gmail/imap_extensions#extension_of_the_search_command_x-gm-raw , as it was pointed out in http://stackoverflow.com/q/11517375/94687 . – imz -- Ivan Zakharyaschev Sep 13 '16 at 16:22
  • How do I search by headers' substring with X-GM-RAW? Is `subject:(!)` correct for searching for subjects containing an exclamation mark? Is there a family of `rfc822*` keys in GMail? – imz -- Ivan Zakharyaschev Sep 13 '16 at 17:22
  • As for searching for `!` in the subject, it seems to not work with any queries in the GMail web interface, too (so, X-GM-RAW wouldn't work, too). See http://webapps.stackexchange.com/q/31322/15124 , http://webapps.stackexchange.com/q/52828/15124 . Very inconvenient! So, an exernal IMAP client for such kind of searches is not a solution (unless the client does the filtering itself, without relying on server responses to `SEARCH`). – imz -- Ivan Zakharyaschev Sep 14 '16 at 12:06
  • @imz--IvanZakharyaschev check out my answer below. The same approach could be used no matter what search criteria you want to use. – jstedfast Sep 16 '16 at 01:10
  • @imz--IvanZakharyaschev please accept an answer. – jstedfast Sep 20 '16 at 15:04

2 Answers2

4

I'm largely basing my answer on the discovery (by Max) in the comments to the original question that GMail's SEARCH implementation uses a backing database that has already split textual content into word tokens rather than storing the full text and doing a substring search.

So here's a possible workaround that you could use with GMail in C# using my MailKit library (which is a fairly low-level IMAP library so this should easily translate into basic pseudocode):

// given: text = "abc!abcdef@abc.com"

// split the search text on '!'
var words = text.Split (new char[] { '!' }, StringSplitOptions.RemoveEmptyEntries);

// build a search query...
var query = SearchQuery.HeaderContains ("Message-ID", words[0]);
for (int i = 1; i < words.Count; i++)
    query = query.And (SearchQuery.HeaderContains ("Message-ID", words[i]));

// this will result in a query like this:
// HEADER "Message-ID" "abc" HEADER "Message-ID" "abcdef@abc.com"

// Do the UID SEARCH with the constructed query:
// A001 UID SEARCH HEADER "Message-Id" "abc" HEADER "Message-Id" "abcdef@abc.com"
var uids = mailbox.Search (query);

// Now UID FETCH the ENVELOPE (and UID) for each of the potential matches:
// A002 UID FETCH <uids> (UID ENVELOPE)
var messages = mailbox.Fetch (uids, MessageSummaryItems.UniqueId |
    MessageSummaryItems.Envelope);

// Now perform a manual comparison of the Message-IDs to get only exact matches...
var matches = new UniqueIdSet (SortOrder.Ascending);
foreach (var message in messages) {
    if (message.Envelope.MessageId.Contains (text))
        matches.Add (message.UniqueId);
}

// 'matches' now contains only the set of UIDs that exactly match your search query
jstedfast
  • 35,744
  • 5
  • 97
  • 110
2

I've been hitting this issue myself for months now.

SEARCH HEADER Message-ID <-!&!...>

Ended up skipping some MsgId searches that start with '<-'. Also see the problems with &!'s ... Not sure how to workaround this well.

Have you ever got a word from Google on this bug?

Thanks much