0

I am trying to decode what texts GMails sends, which should be utf7-imap (actually, if I am not mistaking, utf8 encoded inside utf7?)

I have read: https://en.wikipedia.org/wiki/UTF-7 I am using: https://github.com/skeeto/utf-7 to parse the (for example) the text - and mimetic (https://github.com/tat/mimetic) to parse the raw email text sent.

The corresponding header (subject in this case) is:

Subject: =?UTF-8?B?15TXldeT16LXlCDXotecINeQ15kg15TXoteR16jXqiDXqtep15zXlQ==?=
 =?UTF-8?B?150g16rXp9eV16TXqteZINeR16rXm9eg15nXqiDXnNep15vXmdeo15nXnQ==?=

The encoding mentioned in the comments, is only for the content (body). Headers should be in ASCII only, but some email client do send some kind of 8bit encoding (ISO-8859-?). This is not the case for the message I describe.

I assume there is something else I am missing - where can I find documentation about this subject?

I am looking for solutions in C or C++ (the utf7 library I am using is C, and the mime parsing library is in C++). C++ is always a better alternative.

elcuco
  • 8,948
  • 9
  • 47
  • 69
  • 4
    You need to pick either C or C++. And asking where to find documentation or other off-site resources is explicitly off-topic, so the question might get closed because of that. – Lundin Mar 03 '22 at 11:07
  • You make no mention of C or C++ in your question but tagged the question as being related to these two languages? – Gerhard Mar 03 '22 at 11:56
  • "*I assume there is something else I am missing*" We also have to assume that since you do not include any information in your question about your inputs and results. – Gerhard Mar 03 '22 at 12:00
  • Please take some time to read the [the Stack Overflow question checklist](http://meta.stackexchange.com/q/156810/204922). You might also want to learn about [Minimal, Complete, and Verifiable Examples](http://stackoverflow.com/help/mcve). As your question stands now there is insufficient information to be able to help you. – Gerhard Mar 03 '22 at 12:01
  • Gmail does not send utf7-imap, or utf7 at all. Most likely it is utf8. It is specified in the message header, look there. – arnt Mar 03 '22 at 12:30
  • UTF-7 doesn't offically exist except as a proposal. It shouldn't even use the UTF name as it is not sanctioned or supported by the Unicode Consortium. – Mgetz Mar 03 '22 at 15:55

1 Answers1

1

UTF-7 is used to encode non-ASCII mailbox names in IMAP protocol. This is not related to your example, which shows the RFC 2822 Subject filed with MIME-encoded value according to RFC 2047.

In your example (with the "=?UTF-8?B?" prefix) decoding is simple: the string that follows (up to "?=") is a base64 presentation of an utf-8 encoded string.

Outtruder
  • 227
  • 1
  • 4
  • Section 2 of rfc 2047. Correct. Thanks! – elcuco Mar 03 '22 at 19:37
  • and I found another question on this subject: https://stackoverflow.com/questions/71336003/c-imap-utf7-implementation-in-gmail/71341005#71341005 – elcuco Mar 05 '22 at 09:38