0

When we are sending korean email with Exchange Server it arrives with mime content-type quoted-printable and UTF8 charset and HTML tag: <meta content="text/html; charset=euc-kr" http-equiv="Content-Type"/>.
We parse emails with nodemailer;

The final korean text looks like: 하나은행 보안메일
EML QP: =ED=95=98=EB=82=98=EC=9D=80=ED=96=89 =EB=B3=B4=EC=95=88=EB=A9=94=EC=9D=BC

On the other hand, when we are sending the same email via SMTP Connector it arrives broken.
The gibberish korean text looks like: 占싹놂옙占쏙옙占쏙옙 占쏙옙占싫몌옙占쏙옙
EML QP: =E5=8D=A0=EC=8B=B9=EB=86=82=EC=98=99=E5=8D=A0=EC=8F=99=EC=98=99=E5=8D=A0=EC=8F=99=EC=98=99 =E5=8D=A0=EC=8F=99=EC=98=99=E5=8D=A0=EC=8B=AB=EB=AA=8C=EC=98=99=E5=8D=A0=EC=8F=99=EC=98=99

I assume, the problem is in some incorrect Exchange Server configuration. Maybe it decodes UTF16 as UTF8. Unfortunately we don't have access to the remote Exchange Server. The only way is to fix the broken text locally, already after it arrives.

This is an example that didn't work:

const libqp = require('libqp');
const iconv = require('iconv-lite');
let html = libqp.decode(res);
let html2 = iconv.decode(html, 'euc-kr');

UPDATE: Thanks to https://stackoverflow.com/users/3439404/josefz this issue can be reproduced: iconv.decode(iconv.encode(iconv.decode(iconv.encode('하나은행 보안메일', 'euc_kr'), 'utf_8'), 'utf_8'), 'euc-kr')

Now, we have to run it the opposite way...

mirik
  • 356
  • 4
  • 18
  • 1
    I'd guess that there is wrong `meta content="text/html; charset=euc-kr"`, should be supposedly `meta content="text/html; charset=UTF8"`. You face a **double [mojibake](https://en.wikipedia.org/wiki/Mojibake)** case (*example in Python for its universal intelligibility*): `'하나은행 보안메일'.encode( 'euc_kr').decode( 'utf_8', 'replace').encode( 'utf_8' ).decode( 'euc-kr')` returns `'占싹놂옙占쏙옙占쏙옙 占쏙옙占싫몌옙占쏙옙'` … – JosefZ Jul 10 '23 at 19:57
  • 1
    What programming language is it? Please, remove one of your tags and add it for visibility. – Constantin Hong Jul 12 '23 at 17:24
  • @JosefZ, do you have a solution of how to transform it the opposite way? – mirik Jul 18 '23 at 09:29

1 Answers1

0

You could use iconv-lite, combined with quoted-printable, to convert the broken Korean text to the correct encoding:

const qp = require('quoted-printable');
const iconv = require('iconv-lite');

const brokenText = '=E5=8D=A0=EC=8B=B9=EB=86=82=EC=98=99=E5=8D=A0=EC=8F=99=EC=98=99=E5=8D=A0=EC=8F=99=EC=98=99 =E5=8D=A0=EC=8F=99=EC=98=99=E5=8D=A0=EC=8B=AB=EB=AA=8C=EC=98=99=E5=8D=A0=EC=8F=99=EC=98=99';


//let decodedText = qp.decode(brokenText);
//let utf8_txt = iconv.decode(decodedText, 'euc-kr', { ignoreBOM: true });
const utf8Text = iconv.decode(Buffer.from(brokenText, 'binary'), 'euc-kr');


console.log(utf8_txt);

sylvain
  • 853
  • 1
  • 7
  • 20