38

I was wondering if anyone knew how the thread-index field in email headers work?

Here's a simple chain of emails thread indexes that I messaged myself with.

Email 1 Thread-Index: AcqvbpKt7QRrdlwaRBKmERImIT9IDg==
Email 2 Thread-Index: AcqvbpjOf+21hsPgR4qZeVu9O988Eg==
Email 3 Thread-Index: Acqvbp3C811djHLbQ9eTGDmyBL925w==
Email 4 Thread-Index: AcqvbqMuifoc5OztR7ei1BLNqFSVvw==
Email 5 Thread-Index: AcqvbqfdWWuz4UwLS7arQJX7/XeUvg==

I can't seem to say with certainty how I can link these emails together. Normally, I would use the in-reply-to field or references field, but I recently found that Blackberrys do NOT include these fields. The only include Thread-Index field.

Tim
  • 2,147
  • 4
  • 17
  • 20
  • 5
    If you are looking for how to implement message threading, this is very helpful: http://www.jwz.org/doc/threading.html – deepwell Jul 25 '12 at 21:43
  • 1
    My experience is with .EML as they don't have the ConversationIndex like MSG. I did about 20,000 and if the first 32 chars matched then they appeared to be in the same email chain. – paparazzo Oct 07 '13 at 20:59

4 Answers4

21

They are base64 encoded Conversation Index values. No need to reverse engineer them as they are documented by Microsoft on e.g. http://msdn.microsoft.com/en-us/library/ms528174(v=exchg.10).aspx and more detailed on http://msdn.microsoft.com/en-us/library/ee202481(v=exchg.80).aspx

Seemingly the indexes in your example doesn't represent the same conversation, which probably means that the software that sent the mails wasn't able to link them together.

EDIT: Unfortunately I don't have enough reputation to add a comment, but adamo is right that it contains a timestamp - a somewhat esoteric encoded partial FILETIME. But it also contains a GUID, so it is pretty much guarenteed to be unique for that mail (of course the same mail can exist in multiple copies).

poizan42
  • 1,461
  • 17
  • 22
19

There's a good analysis of how exactly this non-standard "Thread-Index" header appears to be used, in this post and links therefrom, including this pdf (a paper presented at the CEAS 2006 conference) and this follow-up, which includes a comment on the issue from the evolution source code (which seems to reflect substantial reverse-engineering of this undocumented header).

Executive summary: essentially, the author eventually gives up on using this header and recommends and shows a different approach, which is also implemented in the c-client library, part of the UW IMAP Toolkit open source package (which is not for IMAP only -- don't let the name fool you, it also works for POP, NNTP, local mailboxes, &c).

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
  • 1
    According to a newer [comment](http://blog.postmaster.gr/2007/12/11/trying-to-make-use-of-outlooks-thread-index-header/#comment-46307) left on my blog post that you mention _"it’s an OLE timestamp (22 bytes), appended with timediffs (5 bytes). which sucks, because the timestamp is not guaranteed unique."_ – adamo Jan 13 '12 at 07:54
  • That "different approach" implemented in the c-client is described here: http://www.jwz.org/doc/threading.html – Alexander Klimetschek Feb 25 '14 at 22:09
  • It's really crazy how much effort people seems to have put into reversing this even though it has been documented by Microsoft since at least 2003 (https://msdn.microsoft.com/en-us/library/ms528174(v=exchg.10).aspx), and most likely far earlier than that (the CDO library was included back in NT 4.0, the documentation for that probably included the same information). – poizan42 Jul 22 '15 at 10:29
  • @poizan42 it might be documented, but it does not answer one simple question: how do I generate this if I'm not using MS's tech stack. – Alex from Jitbit May 21 '18 at 16:44
  • @Alex, why would you? Just use the standardized References and In-Reply-To headers. Anyways the exact binary format is documented at http://msdn.microsoft.com/en-us/library/ee202481(v=exchg.80).aspx, so just fill that in? – poizan42 May 21 '18 at 17:02
  • @poizan42 "*JUST* fill that in" sound overly simplified, as the header uses MS's own data structures that we have to learn :) Guess we have no choice tho, thanks – Alex from Jitbit May 21 '18 at 17:15
5

I wouldn't be surprised if there are mail clients out there which would not be able to link Blackberry's mails to their threads. The Thread-Index header appears to be a Microsoft extension.

Either way, Novell Evolution implements this. Take a look at this short description of how they do it, or this piece of code that finds the thread parent of a given message.

I assume that, because the lengths of the Thread-Index headers in your example are all the same, these messages were all thread starts? Strange that they're only 22-bytes, though I suppose you could try applying the 5-bytes-per-message rule to them and see if it works for you.

Stéphan Kochen
  • 19,513
  • 9
  • 61
  • 50
  • 1
    It would seem that non-outlook email clients don't handle the thread-index correctly. The thread-indexes from above are from thunderbird. I checked with outlook, and it follows the rule you stated. Quite bothersome. – Tim Feb 22 '10 at 18:56
  • 1
    From looking at a bunch of Outlook-generated Thread-Index headers, I get the feeling that the linked description is slightly wrong: Thread starters have a 22-byte decoded Thread-Index, not 27. – dkarp Jan 04 '11 at 22:17
  • 1
    Here is a related bug in the Mozilla (Thunderbird) bugtracker: https://bugzilla.mozilla.org/show_bug.cgi?id=331207 – guettli Jul 08 '11 at 07:15
1

If you are interested in parsing the Thread-Index in C# please take a look at this post

http://forum.rebex.net/questions/3841/how-to-interprete-thread-index-header

The snippet you will find there will let you parse the Thread-Index and retrieve the Thread GUID and message DateTime. There is a problem however, it does not work for all Thread-Indexes out there. Question is why do some Thread-Indexes generate invalid DateTime and what to do to support all of them???

Gralin
  • 39
  • 5