I'm working on a Spring web app that needs to be able to handle multiple non-English languages in the future. My concern is that, since there are multiple binary representations of many/most characters with diacritical marks and some special characters commonly used in other languages, I will need to normalize all user input to ensure canonical-equivalent characters actually evaluate as equal.
I did some brief experiments, examining String input unmarshalled from a text input on a form already on the site, and found there does not appear to be any normalization happening at the moment. I actually expected Spring might normalize inputs already, but that doesn't appear to be the case, or it has been disabled somehow.
What I'm wanting to know is if there is some way to get all String user input from any site elements to be normalized via Unicode's Normalization Form C before any other operations are performed on the input.
The most relevant result I found searching here is this unanswered question from 2013. I'm still learning about Spring, and the only thing I have found that might be of use is to define a custom HttpMessageConverter (similar to the demonstration here) including normalization of Strings as part of the process, then make sure that converter is applied to all incoming JSON. However, I'm not sure if that will cover every type of input and if I would have to define a custom converter in place of every available converter (my site doesn't have any converters enabled beyond the list in 2.2 at that link). I'm really hoping there is something more universal and less messy than doing it that way, assuming that would even work.