2

This is the first time I am using synonyms in solr and have been reading about it to try and understand how to work with it. Though there are a lot of documentations on the synonym factory and how it works, i couldn't find much on how to get started. So, I started using it by modifying the synonyms.txt file and added a line like

mba => master business administration

and defined the field 'degree' as text_en_splitting_tight which uses the synonymizer filter factory by default.

When I search for the word mba in 'degree', i expect it to convert the input to masters business administration and then match it against the entries in my index, which doesn't happen. But when i try it on the analyzer page of solr in the query segment, it seems to be doing the conversion properly for both 'degree' and 'text_en_splitting_tight'.

  1. What can I do to check if the input is getting converted in case of my php application ?
  2. How can i effectively convert the synonyms entered by the user to one standard word and search against the entries in my index?
  3. Is there any way that I can access the query input after it has passed through the analyzer(parsed input)? I am using Solarium as the php client.

Please don't mind if the question is too amateurish, but I am really finding it hard to find a way through this. Please feel free to criticize if i am missing some important steps out there.

EDIT: Adding the part of the schema.xml below

<field name="candDegree" type="text_en_splitting_tight" indexed="true" stored="true" multiValued="true"/>
<field name="candStream" type="text_en_splitting_tight" indexed="true" stored="true" multiValued="true"/>

EDIT 2: The Field type analyzer goes like this:

<fieldType name="text_en_splitting_tight" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="0" generateNumberParts="0" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="2" outputUnigrams="true"/>
<filter class="solr.EnglishMinimalStemFilterFactory"/>
        <!-- this filter can remove any duplicate tokens that appear at the same position - sometimes
             possible with WordDelimiterFilter in conjuncton with stemming. -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      </fieldType>

The synonyms.txt goes like this:

mba => master business administration
mcom => master commerce
me,mtech,ms => master engineering
mit,mca => master information technology
ma => master humanities


bba,bbm => bachelor business administration
bcom => bachelor commerce
be,btech => bachelor engineering
bca,bit => bachelor information technology
ba => bachelor humanities
abhilashLenka
  • 101
  • 1
  • 12
  • I think your problem is related to this question http://stackoverflow.com/questions/12217024/solr-synonyms-containing-multiple-words as you also have multiple words (`mba` -> `master business administration`) – cheffe Apr 16 '14 at 16:00
  • If this does not help, would you share the relevant parts of your solrconfig.xml and schema.xml? – cheffe Apr 16 '14 at 16:02
  • Hello @cheffe,I have shared the relevant part of schema.xml in my edit, would u be kind enough to tell me which part of the solrconfig.xml should i put up? The above link did teach me a lot about using synonyms but unfortunately didn't carry a solution. When I try using comma separated synonyms, the result is highly irrelevant and often just considers one of the two fields. As you can see, "master" and "business administration" are indexed in two different fields viz: candDegree & candStream; and it is required for the query to take both the terms at once to return a relevant result. – abhilashLenka Apr 18 '14 at 04:52
  • I would also need the typeDefinition of your custom type `text_en_splitting_tight` and all files that are related, like `synonyms.txt`. From the solrconfig.xml the requestHandler you use would be useful. To understand your problem fully, I would need to reproduce it ... – cheffe Apr 18 '14 at 09:12
  • @cheffe: I have posted the required details in my new edit. I am using the default request handlers that work on solr 4.6.1 as i have a very minimal knowledge on request handlers and havent made any changes there – abhilashLenka Apr 18 '14 at 09:56
  • also I am using the dataImportHandler to import from a mysql data source through a jdbc driver, in case this information is required – abhilashLenka Apr 18 '14 at 10:08

0 Answers0