Windows to UTF-8 Character Encoding Behaviour Query

Question

A simple query about expected behaviour when compiling Windows-1252 characters under UTF-8. When building using an ant task on java source code it seems that some weird character encoding occurs.

For certain fields characters that are normally encoded as \u2013 on the windows machine for example, turn into \226 on Linux. What is the explanation for the \226? Will it still be rendered correctly on a browser, for example?

What do you mean by "compiling characters"? It seems that they are being converted to cp1252 (or [some other cp125x encoding](https://cdn.rawgit.com/tripleee/8bit/master/encodings.html#96)) or maybe they are wrong in the source in the first place. They probably will not render correctly, unless the web server correctly identifies the character set to the client. — tripleee, May 01 '15 at 11:36
Sorry for the phrasing! When the characters are interpreted by the java compiler it spits out different results seemingly based on the default charset of the system, so windows would be CP1252 and Linux UTF-8. As part of my ant scripts I've set encoding="cp1252" as a parameter on the javac and javadoc steps in order to stanadrise this, but it seems to be giving different results on windows than on linux. Maybe I'm overthinking things. — CoD, May 01 '15 at 12:03
For reliable cross-platform compilation, stick to plain old 7-bit ANSI in the source code files. You can use Unicode escapes for everything else. — Harry Johnston, May 01 '15 at 12:10
Hey Harry, we considered multiple options on that front and decided UTF-8 might be the best format to go for cross-platform compatibility. We do need some special characters not available in ANSI. We will be eventually converting all of our more nuisence characters (i.e windows only) over to their UTF-8 counterparts. I assume as long as we enforce this encoding we should be fairly golden? — CoD, May 01 '15 at 13:00

Windows to UTF-8 Character Encoding Behaviour Query

0 Answers0