Solr how can I have the original term first than the stemmed version?

Question

I have been trying to get the exact key matched result first in the Solr 5.0.0 result.

For Example,

Meditation Bowls
Goddess Bowls
Celestial Bowls
Bowling Green
33 Bowls Tibetan Singing Bowls
Dust Bowl Revival
Bowl of Stars

If I search for a word bowl, the expected results are:

Dust Bowl Revival
Bowl of Stars
Meditation Bowls
Goddess Bowls
Celestial Bowls
Bowling Green
33 Bowls Tibetan Singing Bowls

The exact word contained results shoud come first. My schema is given below:

 <fieldType name="text_wslc" class="solr.TextField" positionIncrementGap="100">
   <analyzer type="index">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
     <filter class="solr.WordDelimiterFilterFactory"
                             generateWordParts="1"
                             generateNumberParts="1"
                             catenateWords="1"
                             catenateNumbers="1"
                             catenateAll="1"
                             preserveOriginal="1"
                             />
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.KeywordRepeatFilterFactory"/>
     <filter class="solr.PorterStemFilterFactory"/>
     <filter class="solr.KStemFilterFactory"/>
     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
   <analyzer type="query">
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
     <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
     <filter class="solr.WordDelimiterFilterFactory"
                             generateWordParts="1"
                             generateNumberParts="1"
                             catenateWords="1"
                             catenateNumbers="1"
                             catenateAll="1"
                             preserveOriginal="1"
                             />
     <filter class="solr.LowerCaseFilterFactory"/>
     <filter class="solr.KeywordRepeatFilterFactory"/>
     <filter class="solr.PorterStemFilterFactory"/>
     <filter class="solr.KStemFilterFactory"/>
     <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
 </fieldType>

I had seen that using KeywordRepeatFilterFactory gives the exact matched one then stemmed version. But it's not working for me.

score 4 · Answer 1 · answered Jun 29 '15 at 10:51

4

You can add another field in the schema.xml. This one will contains the copy of your original field:

<field name="title" type="text_wslc" indexed="true" stored="true"/>
<field name="titleExact" type="text_wslcExact" indexed="true" stored="true"/>
<copyField source="title" dest="titleExact"/>

Where text_wslcExact is something like that:

<fieldType name="textExact" class="solr.TextField" positionIncrementGap="100" >
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="20"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.LimitTokenCountFilterFactory" maxTokenCount="20"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>

Next thing to do is to add (and boost) this new field to your query. So, in your solrconfig.xml, try to do something like that:

<str name="qf">title titleExact^10</str>
<str name="pf">title^10 titleExact^100</str>

Here is my source where you can have all the explainations.

answered Jun 29 '15 at 10:51

alexf

1,303
9
20

is that give result contains the word as search word first than its stemmed version? – User123 Jun 29 '15 at 11:27
Actually no boost is necessary as the exact match will receive a better score just by finding it in both indexes (title & titleExact) – spyk Jun 29 '15 at 12:52
1

It depends of how its score is calculated. If the score is only the maximum score of all these fields (like [it is for Dismax parser without using `tie` parameter](https://wiki.apache.org/solr/ExtendedDisMax#tie_.28Tie_breaker.29)), the boost is needed, isn't it? – alexf Jun 29 '15 at 13:03
1

Hm, I guess you're right in that case. I had in mind a simpler BooleanQuery, something like: +title:term titleExact:term which would naturally boost matches in both fields. – spyk Jun 29 '15 at 18:16
@alexf title titleExact^10 title^10 titleExact^100 Under which request handler i have to add these things? – User123 Jun 30 '15 at 04:20
The `requestHandler` that you usually use. If you don't use a `requestHandler`, use what spyk said ! – alexf Jun 30 '15 at 07:07

Solr how can I have the original term first than the stemmed version?

1 Answers1