0

I have 2 versions of solr working in my machine . say SolrVer1 and SolrVer2

SolrVer1 have applied , below stemming methods on field type text_en_splitting

<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt" ignoreCase="true"/>
 <filter class="solr.PorterStemFilterFactory" ignoreCase="true"/>

SolrVer2 have applied , below stemming methods on field type text_en_splitting

<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>

it works almost same for regular search , but while using wild card search then wild card search does not giving results with grammatical on SolrVer1

like searching with ray* , SolrVer1 returns very less data as compared to SolrVer2. when i observed the results then i found that SolrVer1 does not return data with only ray and rays.

I don't know where i should use SnowballPorterFilterFactory and where i should use PorterStemFilterFactory . and what are the pros and cons of them?

Can anybody have idea on this behavior ??

Thanks

meghana
  • 907
  • 1
  • 20
  • 45

2 Answers2

1

Need to know what the stemmers output for ray, rays.

Try stemming them at the Porter stemmer online tool: http://qaa.ath.cx/porter_js_demo.html. It outputs rai! That's the reason you don't get any matches for ray* with Porter stemmer.

And here is a tool for snowball stemmer: http://snowball.tartarus.org/demo.php. This outputs ray for ray and rays which is why you get the results.

You may want to read this for comparing the two stemmers: http://snowball.tartarus.org/texts/introduction.html

Appears like snowball was designed to address such short-comings of Porter.

arun
  • 10,685
  • 6
  • 59
  • 81
  • Thanks @arun , for your reply. its really very helpful. so do you suggest that i should use `SnowballPorterFilterFactory` ? – meghana Aug 24 '12 at 11:18
  • In general, snowball (including Porter) stemmers are considered aggressive, meaning they will map a lot of words to the same stem. It is tough to tell which stemmer you should use, since it depends on your search needs. I would recommend asking your customers for examples. You should read https://wiki.apache.org/solr/LanguageAnalysis#Stemming to understand the differences between various Solr stemmers and pick the correct one for your needs. – arun Aug 24 '12 at 16:56
  • Thanks @arun for your reply , i guess you are right, I'll ask my customer for examples. :) – meghana Aug 27 '12 at 07:39
0

Analyzers

On wildcard and fuzzy searches, no text analysis is performed on the search word.

As no analysis is done at query time for wilcard searches and hence the stemmers would be applied during query time.
The results would be different depending upon what the stemmers are producing.

Jayendra
  • 52,349
  • 4
  • 80
  • 90
  • Thanks @Jayendra, for your reply. yes i seen that for wild card search results differ depending upon stemmer applied on them. – meghana Aug 24 '12 at 11:22