JAVA MessageFormat.Format with umlauts (ä / ö / ü)

Question

I have a problem with MessageFormat.format in my Java backend. I have there a mailing function which sends mails with content from my frontend (passend via api to the backend) to some users.

String text =
    MessageFormat.format(
        "Dear Report Owner\n\nA new access request:\n\nFrom: {0} {1} ({2})\nFor: {3} \nReason: {4}\n\nPlease process the access request and inform {0} {1} accordingly.\n\nBest regards,\nDev-Team",
        accessTokenUser.getGivenName(),
        accessTokenUser.getFamilyName(),
        accessTokenUser.getEmail(),
        processedRoleContent,
        processedLinkContent);

It's possible, that some values (e.g. processedRoleContent) contains for example ü but in the sent email it appears as Ü.

How can I configure the MessageFormat.format that it send's umlauts?

Thank you in advance!

1. Are you sure that the UTF-8 character "ü" arrives at your backend? 2. Are you sure that `MessageFormat.format` converts "ü" into the HTML entity and not your mailing function? — Smutje, Mar 27 '20 at 08:47
MessageFormat is unrelated to that issue. The conversion from ü to `Ü` is done by another component, on the String text produced. You need to identify that other component, and modify it so that it doesn't make that replacement. — kumesana, Mar 27 '20 at 09:00
What a surprise, a method that is called "escapeHtml4" really escapes things — Smutje, Mar 27 '20 at 09:31
@Smutje it's my first java project -> not sure what this method does. how can I fix it? Or what does it actually do? — Mike, Mar 27 '20 at 09:33
If it's the `org.apache.commons.text.StringEscapeUtils` they are open source so you should be able to look at the code. How the name suggests, it escapes text in accordance with HTML 4.0 (https://www.w3.org/TR/REC-html40/sgml/entities.html) because HTML 4.0 does not support UTF-8. — Smutje, Mar 27 '20 at 09:37
@Smutje yes it's the `org.apache.commons.text.StringEscapeUtils`. Does thy also have HTML5? Or do you know if it's necessary? May I just change the return of the method to `return noneEmpty` so it would work also. — Mike, Mar 27 '20 at 09:44
@Smutje I found this workaround but I don't like the solution -> https://stackoverflow.com/questions/34927373/replacing-of-html-5-codes-with-equivalent-characters-in-java do you know a other way or should I just delete it? — Mike, Mar 27 '20 at 09:51
If your target is HTML 5 and you have an UTF-8 String you might skip the whole escape routine completely. — Smutje, Mar 27 '20 at 10:00

score 1 · Answer 1 · answered Mar 27 '20 at 08:50

Consider the following minimal demonstration of why I think MessageFormat.format has nothing to do with your problem:

import java.text.MessageFormat;

public class Application {

    public static void main(String[] args) {
        System.out.println(MessageFormat.format("{0}", "ü"));
    }
}

which results at my machine in the output ü.

So, I think your E-Mail function escapes Umlauts as HTML entities.

score 0 · Answer 2 · answered Mar 27 '20 at 08:48

0

You can convert byte then UTF-8 for any kind of String.

new String(out.toByteArray(), "UTF8")

answered Mar 27 '20 at 08:48

Obyvante

95
2
11

If you read the example, it sends ü to ¨ and tthe question is how it can send ü without convert. – Obyvante Mar 27 '20 at 08:59
@user85421it is byte. You can convert string to byte using string.getBytes() – Obyvante Mar 27 '20 at 09:11
If you would not to do that, you can just change your idea settings to UTF-8. You have to convert text to byte for converting again to another type. Bytes are not changeable. If a equals b. it is always "a" equals "b". – Obyvante Mar 27 '20 at 09:23

JAVA MessageFormat.Format with umlauts (ä / ö / ü)

2 Answers2