I have the following same code run in API level 16 vs API level 21, and I found that in API level 16, the dictionary based iterator (tokenizer) seems not working, while in API level 21, the dictionary based iterator is working properly.
BreakIterator it = BreakIterator.getWordInstance();
String txt = "我们一起";
it.setText(txt);
int start = it.first();
int end = it.next();
buf = new StringBuffer();
while (end != BreakIterator.DONE) {
String word = txt.substring(start,end).trim();
if (!word.isEmpty()) {
buf.append(word);
buf.append("+");
}
start = end;
end = it.next();
}
vw.setText(buf);
In API Level 21, the text view shows ("我们" is a word, "一起" is a word)
我们+一起+
However in API Level 16, it shows as below (each Chinese character is a word):
我+们+一+起+
So I suspect that the API level 21 has enabled the dictionary based iterator, while previous API versions not.
However, after I have a search in the C++ source code of Android, I found that the key function RuleBasedBreakIterator::checkDictionary is both there in rbbi.cpp, for both API levels. It gives me the hints that both API shall support dictionary based iterator. I also suspect that the difference is because of the different category value set for different char-set. However I am not able trace back how these values are set and whether there is difference.
My question is, how to further confirm that the API implementation is enhanced in API level 21?