1

I have a form where the user inputs Chinese/Japanese characters and then I compare it with a declared value. The problem is even if the input and the declared value are the same it is not equal.

A hard-coded variable such as variableA = "官话" is not equal to the form input text. When it is printed, it displays "官话".

Even their lengths aren't equal! new String("官话").length(); is not equal to
formInputtedCharacter.length();, where the input when printed is "官话" (UTF-8 already)

How could this be?

Makoto
  • 104,088
  • 27
  • 192
  • 230
ianrey palo
  • 61
  • 1
  • 10

2 Answers2

2

The most likely cause (if you're sure the form data is processed correctly) is that the Java compiler is using the wrong encoding when processing your literal. Make sure that it uses the same encoding as whatever you use to edit your source code.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
1

Java Strings are represented using UTF-16 which is a 2 or 4 byte long mapping to a character in the Unicode charset.

It seems that there are either two different unicode characters for 官话 or a character encoding issue. Perhaps one Chinese and one Japanese character happens to look identical, or similar? If there are two distinct unicode characters there will be two different byte representations for it. Hence they are not similar in Java.

Johan Sjöberg
  • 47,929
  • 21
  • 130
  • 148