How to get two-sequence representation of UTF-8 character using JavaMail's MimeUtility or Apache Commons and quoted-printable?

Question

I'm having a string which contains the German ü character. Its UTF value is 0xFC, but its quoted-printable sequence should actually be =C3=BC instead of =FC. However, using JavaMail's MimeUtility like below, I can only get the single-sequence representation.

String s = "Für";
ByteArrayOutputStream baos = new ByteArrayOutputStream ();
OutputStream encodedOut = MimeUtility.encode (baos, "quoted-printable");

encodedOut.write (s.getBytes (StandardCharsets.UTF_8));
String encoded = baos.toString ();   // F=FCr

(Defining StandardCharsets.US_ASCII instead of UTF_8 resulted in F?r, which is - obviously - not what I want.)

I have also already taken a look into Apache Commons' QuotedPrintableCodec, which I used like this:

String s = "Für";
QuotedPrintableCodec qpc = new QuotedPrintableCodec ();
String encoded = qpc.encode (s, StandardCharsets.UTF_8);

However, this resulted in F=EF=BF=BDr, which is similar to the result Java's URLEncoder would produce (% instead of = as an escape character, F%EF%BF%BDr), and which is not understandable to me.

I'm getting the string from a JavaMail MimeMessage using a ByteArrayOutputStream like so:

ByteArrayOutputStream baos = new ByteArrayOutputStream ();
message.writeTo (baos);
String s = baos.toString ();

On the initial store procedure, I receive a string containing a literal � (whose correct quoted-printable sequence seems to be =EF=BF=BD) instead of an umlaut-u. However, on any consecutive request Thunderbird makes (e.g. copying to Sent), I receive the correct ü. Is that something I can fix?

What I would like to receive is the two-sequence representation as required by IMAP and the respective mail clients. How would I go about that?

I get `F=C3=BCr`. It shouldn’t be possible for getBytes(UTF_8) to produce a standalone FC byte. — VGR, Nov 24 '18 at 18:44
@VGR Oh yes. That's right. However, I get some strange string when receiving from a `MimeMessage`, I updated my question. — Alexander Leithner, Nov 25 '18 at 10:41
`boas.toString()` is most likely your problem, as it uses the platform’s charset to decode the bytes. I would use something like `String s = (String) message.getContent()`, which will take the message’s charset into account. — VGR, Nov 25 '18 at 15:58
Well, sadly that doesn't help it, since the `MimeMessage` returns a string which literally contains `�` although the message's `Content-Type` header is set to `text/plain; charset=utf-8` which *should* work just right. (But the consecutive requests made by Thunderbird include the correct `ü`, no matter of how I retrieve the message's contents. However, I need it to work on the very first request...) — Alexander Leithner, Nov 26 '18 at 13:55

How to get two-sequence representation of UTF-8 character using JavaMail's MimeUtility or Apache Commons and quoted-printable?

0 Answers0