I'm trying to index some Chinese documents with Solr, but it looks like Solr doesn't index some segmented words.
Analyzer I use is IK analyzer http://code.google.com/p/ik-analyzer/.
The field to be indexed:
<field name="hospital_alias_splitted" type="cn_ik" indexed="true" stored="true" multiValued="true" omitNorms="false"/>
cn_ik definition:
<fieldType name="cn_ik" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="org.wltea.analyzer.solr.IKTokenizerFactory" useSmart ="false"/>
</analyzer>
For example, the word that will be indexed is "AB" (without quotes). After word segmentation using a Chinese analyzer, I got 3 tokens, they are "AB", "A" and "B".
As we can see, the first token "AB" covers the following two tokens.
After feeding these tokens to Solr, it looks like Solr only index "AB", "A" and "B" are ignored. Because when I search "A" or search "B" doesn't get any result.
I guess when Solr indexing "AB", it already reaches the end of indexed word, so "A" and "B" are ignored.
Using Luke and Analysis Request Handler don't show me more hints. I'm not sure this is a bug or a feature of Solr.
Any comment or suggestion?
Thanks :)