0

Here is what happens.

User types in "лос ан"

I have a bunch of products whose location is "лос анджелис"

if I do:

String userInput = "лос ан"
for(Product product : products) {

    if(product.getCity().trim().toLowerCase().contains(userInput.trim().toLowerCase())) {
        System.out.println("MATCH");
    }

}

I don't get MATCH.

This works for Latin characters

Kaloyan Roussev
  • 14,515
  • 21
  • 98
  • 180
  • 1
    The problem probably doesn't come from `contains` but from `toLowerCase` (locale issue). – Tunaki Sep 17 '15 at 11:44
  • so what should I use instead of toLowerCase? can I do some kind of contains ignoring the case? – Kaloyan Roussev Sep 17 '15 at 11:45
  • Problem is the same: ignoring the case. This is a locale-dependent question since the same character can be lowercased differently depending on the locale. You need to ask the user their language and use it accordingly. Please refer to this answer: http://stackoverflow.com/a/11063161/1743880 – Tunaki Sep 17 '15 at 11:47
  • are you sure there is a match? i tried here : http://ideone.com/c5UDiv and it works – user902383 Sep 17 '15 at 12:10

3 Answers3

1

try specifying Locale in toLowerCase() on both sides of the equation: http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#toLowerCase(java.util.Locale)

Sharon Ben Asher
  • 13,849
  • 5
  • 33
  • 47
  • What locale should I specify for Bulgarian? Also I don't know what other languages are users going to be entering, so I want to be able to support most languages dynamically. – Kaloyan Roussev Sep 17 '15 at 11:44
  • regarding "What locale should I specify for Bulgarian" I can give you an answer but you can alsodo the exact same thing I did: use google search engine. – Sharon Ben Asher Sep 17 '15 at 11:46
  • regarding "other languages are users going to be entering" you should request the locale settings that the user is using. if this is coming from a browser, then there is usually a HTTP header that contains this info. otherwise, you must make provision to get this info – Sharon Ben Asher Sep 17 '15 at 11:48
1

The editor and the compiler (javac -encoding) must use the same encoding.

The compiler encoding are done easily. The editor, source encoding, can be tested with a programmer's editor like NotePad++ or JEdit, which can switch encodings.

You can also u-escape the Java source text to check this:

String userInput = "\u043b\u043e\u0441 \u0430\u043d";

If that does not work, there is a discrepancy between the encodings.

Furthermore String.toLowerCase(new Locale("ru", "RU")) or such is already mentioned.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • user input is not a hardcoded string, its taken from an Android edittext input. How do I u-escape it? also, I cannot assume that they will write in Russian, because next time someone will write in Greek or Armenian :( – Kaloyan Roussev Sep 17 '15 at 14:16
  • This "solution" only addresses **hard coded constants in Java** and checking the right encodings. In your case you probably should work in UTF-8, Unicode, and check everything from database, file system, locale. – Joop Eggen Sep 17 '15 at 14:22
0

Using jdk 1.8.0_45, the following code gives a match in both cases:

System.out.println("лос анджелис".trim().toLowerCase().contains("лос ан".trim().toLowerCase()));
System.out.println("лос анджелис".trim().toLowerCase(Locale.ROOT).contains("лос ан".trim().toLowerCase(Locale.ROOT)));

As others already mentioned, you may look for a working Locale as argument to String#toLowerCase.

Binkan Salaryman
  • 3,008
  • 1
  • 17
  • 29