0

I'm using Solr as a search engine in my site and all is going well except for synonym matching.

My synonym.txt file looks like:

uk => united kingdom,england,scotland,wales

This works for returning results marked "United Kingdom" but not for the others. If I reverse the ordering then "United Kingdom" results aren't returned.

My fieldtype looks like this:

<fieldType name="text" class="solr.TextField" omitNorms="false">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StandardFilterFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SynonymFilterFactory" tokenizerFactory="solr.KeywordTokenizerFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
      </analyzer>
    </fieldType>

I'm pretty new to solr so any help is much appreciated!

Gerard
  • 4,818
  • 5
  • 51
  • 80
  • 1
    Did you try using the [Debugging feature](http://wiki.apache.org/solr/CommonQueryParameters#Debugging) Solr Debugging Ex: ../?q=keyword&debugQuery=true. You can see how your field type & data behave by using analysis page http:///solr/admin/analysis.jsp?highlight=on. – mailboat Aug 16 '12 at 15:59

2 Answers2

2

In the wiki, it is recommended to use SynonymFilter only at index time. Also, try setting the "expand" flag to true, which again is the recommended approach for dealing with multi-word synonyms.

spyk
  • 878
  • 1
  • 9
  • 26
0

You have 2 operators in synonyms.txt: , and => and I guess you are using one instead of the other.

=> operator is replacing one bit of text by another which is very useful to normalise. Advantage: it doesn't grow your indexes and doesn't add ambiguity. Drawback: you must apply the filter to both index and query. Exple: doesn't => does not Structuraly, you replace one text by another text, so you can't have a list

, operator will expand one bit of text into ALL the others. It is recommended to used it for index only (all the synonyms will be in the index and will match with any of the words). Drawback: it will grow your indexes. , operator can also be used at query time only, but the behaviour can be quite difficult to predict when making complex queries and it will slow down your requests. So, not recommended.

To have the expected behaviour, you should write:
uk,united kingdom,england,scotland,wales

Beware that depending on the tokenizer used, there may be some issues related to the multi-word (already many treads about this one): a search for "kingdom" will find all the documents indexed with UK. Which may be the expected behaviour... Or not.

Addendum: I just realised you may want to replace "uk" by "united kingdom,england,scotland,wales" as a text. In this case, you have to escape the , (replace it by \, if my memory is correct). Again, result of your search will heavily depend on how it is tokenised.

Pr Shadoko
  • 1,649
  • 2
  • 15
  • 16