This is the first time I am using synonyms in solr and have been reading about it to try and understand how to work with it. Though there are a lot of documentations on the synonym factory and how it works, i couldn't find much on how to get started. So, I started using it by modifying the synonyms.txt file and added a line like
mba => master business administration
and defined the field 'degree' as text_en_splitting_tight which uses the synonymizer filter factory by default.
When I search for the word mba in 'degree', i expect it to convert the input to masters business administration and then match it against the entries in my index, which doesn't happen. But when i try it on the analyzer page of solr in the query segment, it seems to be doing the conversion properly for both 'degree' and 'text_en_splitting_tight'.
- What can I do to check if the input is getting converted in case of my php application ?
- How can i effectively convert the synonyms entered by the user to one standard word and search against the entries in my index?
- Is there any way that I can access the query input after it has passed through the analyzer(parsed input)? I am using Solarium as the php client.
Please don't mind if the question is too amateurish, but I am really finding it hard to find a way through this. Please feel free to criticize if i am missing some important steps out there.
EDIT: Adding the part of the schema.xml below
<field name="candDegree" type="text_en_splitting_tight" indexed="true" stored="true" multiValued="true"/>
<field name="candStream" type="text_en_splitting_tight" indexed="true" stored="true" multiValued="true"/>
EDIT 2: The Field type analyzer goes like this:
<fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
<!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
possible with WordDelimiterFilter in conjuncton with stemming. -->
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
The synonyms.txt goes like this:
mba => master business administration
mcom => master commerce
me,mtech,ms => master engineering
mit,mca => master information technology
ma => master humanities
bba,bbm => bachelor business administration
bcom => bachelor commerce
be,btech => bachelor engineering
bca,bit => bachelor information technology
ba => bachelor humanities