-1

So i am working on a project which involves searching of a word in different languages. I can easily get the Locale of the language but i dont know how to search for the word in another language. So the text can be in Chinese and the word to be searched can be in english. For example in php we have grapheme_stripos i am looking for a similar functionality in Java. I havent found anything which does a grapheme search in java. So one way might be to break down the string and store it in a byte array and search through it but isnt there something better like grapheme_stripos in php that solves the purpose?

Rohan
  • 673
  • 6
  • 15
  • "Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it." – pvg Mar 21 '17 at 19:33
  • I have mentioned an alternative but am looking for a better solution.This is basically a theoretical question so asking for a concept/idea doesnt seem wrong. And if you look closely i am not asking for a library/tutorial/tool nor am i asking for code. I am just looking for a point in the right direction :) Thanks – Rohan Mar 21 '17 at 19:47
  • It's still basically asking people to google for you which you can do yourself. Have you looked at what the standard tools provide? Which ones have you tried (https://mvnrepository.com/artifact/com.ibm.icu/icu4j ?) and what was missing, etc. – pvg Mar 21 '17 at 19:59

1 Answers1

0

PHP uses UTF-8, so searching for a grapheme is not trivial. Java uses UCS-2 where most of the characters (all the BMP chracters) are one Character wide. Some CJK are off BMP, though.

Look at the CodePoints-related functionality of java.lang.String. Most of the time, indexOf and regionMatches do the right thing.

Also, take a look at dedicated full-text search solution.

9000
  • 39,899
  • 9
  • 66
  • 104