0

I am working on spell checking in Solr. I have implemented Suggestions and collations in my spell checker component.

Most of the time collations work fine but in few case it fails.

Working:

I tried query:gone wthh thes wnd: In this wnd doesn't give suggestion wind but collation is coming right = gone with the wind, hits = 117

Not working:

But when I tried query: gone wthh thes wint: In this, wint does give suggestion wind but collation is not coming right. Instead of gone with the wind it gives gone with the west, hits = 1

And I also want to know what is hits in collations.

Configuration:

solrconfig.xml:

<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">textSpellCi</str>
    <lst name="spellchecker">
      <str name="name">default</str>
      <str name="field">gram_ci</str>
      <str name="classname">solr.DirectSolrSpellChecker</str>
      <str name="distanceMeasure">internal</str>
      <float name="accuracy">0.5</float>
      <int name="maxEdits">2</int>
      <int name="minPrefix">0</int>
      <int name="maxInspections">5</int>
      <int name="minQueryLength">2</int>
      <float name="maxQueryFrequency">0.9</float>
      <str name="comparatorClass">freq</str>
    </lst>
</searchComponent>

<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="df">gram_ci</str>
      <str name="spellcheck.dictionary">default</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>
      <str name="spellcheck.count">25</str>
      <str name="spellcheck.onlyMorePopular">true</str>
      <str name="spellcheck.maxResultsForSuggest">100000000</str>
      <str name="spellcheck.alternativeTermCount">25</str>
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.maxCollations">50</str>
      <str name="spellcheck.maxCollationTries">50</str>
      <str name="spellcheck.collateExtendedResults">true</str>
    </lst>
    <arr name="last-components">
      <str>spellcheck</str>
    </arr>
  </requestHandler>

Schema.xml:

<field name="gram_ci" type="textSpellCi" indexed="true" stored="true" multiValued="false"/>

</fieldType><fieldType name="textSpellCi" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
</analyzer>
    <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="5" minShingleSize="2" outputUnigrams="true"/>
</analyzer>
</fieldType>
iNikkz
  • 3,729
  • 5
  • 29
  • 59

1 Answers1

0

I got the answer of my question. After depth reading, I found the logic behind collations.

1) spellcheck.maxCollations: It just make collation candidates to test against the index.

2) spellcheck.maxCollationTries : It evaluates the collations that builds at the time of spellcheck.maxCollations. If we set the value of spellcheck.maxCollationTries low, then it gives better collations while if we set the value of spellcheck.maxCollationTries high, then it gives more collation results, but it harms the performance.

So, By increasing the value of spellcheck.maxCollationTries, it gives the collation of gone wthh thes wint to gone with the wind but again I say, it harms the performance.

iNikkz
  • 3,729
  • 5
  • 29
  • 59
  • I want to do spell/query correction functionality. I have 49 GB indexed data where I have applied spellchecker. I want to do same as Google - "did you mean". Example - If any user types any question/query which might be misspell or wrong typed. I need to give them suggestion like "Did you mean". Is Solr best for it? – iNikkz Feb 23 '15 at 16:24