Need help to decide between the type of spellchecker to use in solr?

Question

I have a list of cities on mysql db which is hooked onto a UI for autocompletion purposes. I am currently using solr-5.3.0. Data import is happening through scheduled delta imports. I have the following questions:

I want to implement spell checker to this feature. I tried using:

DirectSolrSpellChecker
IndexBasedSpellChecker
FileBasedSpellChecker

Out of these 3 only FileBasedSpellChecker is able to give suggestions that solely exists on db. For eg, while searching cologne I've got results like

    {
  "responseHeader":{
    "status":0,
    "QTime":4,
    "params":{
      "q":"searchfield:kolakata",
      "indent":"true",
      "spellcheck":"true",
      "wt":"json"}},
  "response":{"numFound":0,"start":0,"docs":[]
  },
  "spellcheck":{
"suggestions":[
  "cologne",{
    "numFound":4,
    "startOffset":12,
    "endOffset":19,
    "suggestion":["Cologne",
      "Bologna",
      "Cogne",
      "Bastogne"]}],
"collations":[
  "collation","searchfield:Cologne"]}}

These cities are pretty accurate and exists in db/file.

But when I use other 2 I got results like

  {
  "responseHeader":{
    "status":0,
    "QTime":4,
    "params":{
      "q":"searchfield:kolakata",
      "indent":"true",
      "spellcheck":"true",
      "wt":"json"}},
  "response":{"numFound":0,"start":0,"docs":[]
  },
  "spellcheck":{
"suggestions":[
  "cologne",{
    "numFound":4,
    "startOffset":12,
    "endOffset":19,
    "suggestion":["Cologne",
      "Cologn",
      "Colognei"]}],
"collations":[
  "collation","searchfield:Cologne"]}}

These cities who are not present in my db.

Though FileBasedSpellChecker is giving satisfactory results, but I am a little apprehensive in using them because, I would need to keep updating the file manually everytime a new city gets added/removed. Also its generally not advisable to use FileBasedSpellChecker in general.

I also need to make the suggestions searchable as well, that means currently I am accessing the doc returned in
```
"responseHeader":{"response":{"docs":[<some-format>]}} 
```
to search for results in that city, but now I want the suggestor to return the results in the same <some-format> instead of just string results, in order to get it integrated with UI properly.
One minor change requested is to sort the suggestions in ascending order of edit/levenshtein distance. This is not a hard requirement and can be negotiated with.

edit My solrconfig looks like this:

<requestHandler name="/select" class="solr.SearchHandler">
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <int name="rows">10</int>
       <str name="df">searchfield</str>
       <str name="spellcheck">true</str>
       <str name="spellcheck.collate">true</str>
       <str name="spellcheck.dictionary">file</str>
       <str name="spellcheck.maxCollationTries">5</str>
       <str name="spellcheck.count">5</str>
     </lst>
     <arr name="last-components">
        <str>spellcheck</str>
     </arr>
</requestHandler>

and

  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
        <str name="queryAnalyzerFieldType">text_ngram</str>
        <lst name="spellchecker">
                <str name="name">file</str>
                <str name="classname">solr.FileBasedSpellChecker</str>
                <str name="sourceLocation">spellings.txt</str>
                <str name="spellcheckIndexDir">./spellchecker</str>
        </lst>
  </searchComponent>

schema looks like this:

 <field name="name" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="latlng" type="location" indexed="true" stored="true" multiValued="false" />
    <field name="citycode" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="country" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="searchscore" type="float" indexed="true" stored="true" multiValued="false" />
    <field name="searchfield" type="text_ngram" indexed="true" stored="false" multiValued="true" omitNorms="true"  omitTermFreqAndPositions="true" />
<defaultSearchFieldsearchfield</defaultSearchField>
        <solrQueryParser defaultOperator="OR"/>
        <copyField source="name" dest="searchfield"/>

What is the field type of the field you have your cities in? Can it be that you're seeing a stemmed/processed version of the city name? Indexing the city name by itself to a separate field and then using that for suggestions might be better. — MatsLindh, Nov 22 '16 at 14:07
I am indexing it on searchfield whose schema entry looks like this `` — diwakarb, Nov 22 '16 at 14:10
A ngram field will generate loads of tokens, so my guess is that you're seeing suggestions based on partial tokens from that field. Use a regular stringfield or a keywordtokenized field to get exact suggestions. — MatsLindh, Nov 22 '16 at 14:20
Unfortunately it didn't worked. with following solrconfig change`stringstring solr.IndexBasedSpellChecker./spellcheckernametrueorg.apache.lucene.search.spell.LevensteinDistance` — diwakarb, Nov 22 '16 at 15:11
Have decided to go with FileBasedSpellChecker. I might try to use FileDictionaryFactory for getting answers for the second question. — diwakarb, Nov 23 '16 at 17:39

Need help to decide between the type of spellchecker to use in solr?

0 Answers0

Linked