0

I use the imap with nodejs and fetch some E-Mails. All works fine. The imap is a low level lib and gives all the data without any encoding to my application.

The subject correctly is "GMX Täßt". I received this =?UTF-8?Q?GMX_T=C3=A4=C3=9Ft?= from the lib. All fine at this time. If I wanna convert it back to "GMX Täßt" (I have tried many things.." I got GMX T├ñ├ƒt withtout umlauts.

stringFromSubject = "=?UTF-8?Q?GMX_T=C3=A4=C3=9Ft?=";

decodeURIComponent(
      stringFromSubject.replace(/=([0-9A-F]{2})/g, "%$1")
    ).replace(/_/g, " ");

Here is the link of the lib. Can anyone help to convert it back "correctly"?

Max
  • 10,701
  • 2
  • 24
  • 48
Flo
  • 2,232
  • 2
  • 11
  • 18
  • A quick [Search for "javascript convert utf-7"](https://www.google.com/search?q=javascript+convert+utf-7) gets you https://github.com/kkaefer/utf7, and also [This question](https://stackoverflow.com/questions/60793606), which includes a hacked-together solution – Jonathan Feb 26 '23 at 13:15
  • The problem is the Windows "Shell"! (or command promt). It doesn't show the umlauts, emoji correctly. If I save all into a simple text file with node's `writeFileSync` all will be shown correctly... – Flo Feb 26 '23 at 13:20
  • Yes, Windows console has poor non-ASCII character support. I use [Windows Terminal](https://apps.microsoft.com/store/detail/windows-terminal), which has better support, plus other better things as well – Jonathan Feb 26 '23 at 13:23
  • Windows Terminal under the hood uses PowerShell and Command Promt, too or doesn't it? – Flo Feb 26 '23 at 13:30
  • 1
    Yes. But the limitation is not of PowerShell/CMD themselves, but of the Windows Console which hosts them - and Windows Terminal replaces that. – Jonathan Feb 26 '23 at 13:36
  • 1
    That is not UTF-7 (whcih is only used for IMAP folder names). That's RFC 2047 Encoded Words. The UTF-8 is the charset, and the Q means it's using Quoted-Printable encoding for the non-ascii portions. It looks like [`parseHeader`](https://github.com/mscdex/node-imap) can decode encoded-words. – Max Feb 26 '23 at 13:55

1 Answers1

0

E-mail processing is all about encapsulation, layers and compatibility. At the very bottom of the stack, there's the historical RFC 822 (then RFC 2822, and these days RFC 5322) which says how ASCII text is encoded into e-mail headers. On top of that, there's MIME which defines in RFC 2047 how Unicode text can be encoded using just ASCII letters, for use in e-mail headers.

What you get from your imap library is a text prior to RFC 2047 decoding. To convert that to Unicode (and to merge these encoded-words into a piece of text), follow the process described in that document (or use a library that implements this functionality for you). Do not use "URI decoding" from RFC 3986 for that; that's a different standard which is solving an unrelated problem.

Jan Kundrát
  • 3,700
  • 1
  • 18
  • 29