1

I have a list of cities on mysql db which is hooked onto a UI for autocompletion purposes. I am currently using solr-5.3.0. Data import is happening through scheduled delta imports. I have the following questions:

  1. I want to implement spell checker to this feature. I tried using:

    1. DirectSolrSpellChecker
    2. IndexBasedSpellChecker
    3. FileBasedSpellChecker


    Out of these 3 only FileBasedSpellChecker is able to give suggestions that solely exists on db. For eg, while searching cologne I've got results like

        {
      "responseHeader":{
        "status":0,
        "QTime":4,
        "params":{
          "q":"searchfield:kolakata",
          "indent":"true",
          "spellcheck":"true",
          "wt":"json"}},
      "response":{"numFound":0,"start":0,"docs":[]
      },
      "spellcheck":{
    "suggestions":[
      "cologne",{
        "numFound":4,
        "startOffset":12,
        "endOffset":19,
        "suggestion":["Cologne",
          "Bologna",
          "Cogne",
          "Bastogne"]}],
    "collations":[
      "collation","searchfield:Cologne"]}}
    

    These cities are pretty accurate and exists in db/file.

    But when I use other 2 I got results like

      {
      "responseHeader":{
        "status":0,
        "QTime":4,
        "params":{
          "q":"searchfield:kolakata",
          "indent":"true",
          "spellcheck":"true",
          "wt":"json"}},
      "response":{"numFound":0,"start":0,"docs":[]
      },
      "spellcheck":{
    "suggestions":[
      "cologne",{
        "numFound":4,
        "startOffset":12,
        "endOffset":19,
        "suggestion":["Cologne",
          "Cologn",
          "Colognei"]}],
    "collations":[
      "collation","searchfield:Cologne"]}}
    

    These cities who are not present in my db.

    Though FileBasedSpellChecker is giving satisfactory results, but I am a little apprehensive in using them because, I would need to keep updating the file manually everytime a new city gets added/removed. Also its generally not advisable to use FileBasedSpellChecker in general.

  2. I also need to make the suggestions searchable as well, that means currently I am accessing the doc returned in

    "responseHeader":{"response":{"docs":[<some-format>]}} 
    

    to search for results in that city, but now I want the suggestor to return the results in the same <some-format> instead of just string results, in order to get it integrated with UI properly.

  3. One minor change requested is to sort the suggestions in ascending order of edit/levenshtein distance. This is not a hard requirement and can be negotiated with.

edit My solrconfig looks like this:

<requestHandler name="/select" class="solr.SearchHandler">
     <lst name="defaults">
       <str name="echoParams">explicit</str>
       <int name="rows">10</int>
       <str name="df">searchfield</str>
       <str name="spellcheck">true</str>
       <str name="spellcheck.collate">true</str>
       <str name="spellcheck.dictionary">file</str>
       <str name="spellcheck.maxCollationTries">5</str>
       <str name="spellcheck.count">5</str>
     </lst>
     <arr name="last-components">
        <str>spellcheck</str>
     </arr>
</requestHandler>

and

  <searchComponent name="spellcheck" class="solr.SpellCheckComponent">
        <str name="queryAnalyzerFieldType">text_ngram</str>
        <lst name="spellchecker">
                <str name="name">file</str>
                <str name="classname">solr.FileBasedSpellChecker</str>
                <str name="sourceLocation">spellings.txt</str>
                <str name="spellcheckIndexDir">./spellchecker</str>
        </lst>
  </searchComponent>

schema looks like this:

 <field name="name" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="latlng" type="location" indexed="true" stored="true" multiValued="false" />
    <field name="citycode" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="country" type="string" indexed="true" stored="true" multiValued="false" />
    <field name="searchscore" type="float" indexed="true" stored="true" multiValued="false" />
    <field name="searchfield" type="text_ngram" indexed="true" stored="false" multiValued="true" omitNorms="true"  omitTermFreqAndPositions="true" />
<defaultSearchFieldsearchfield</defaultSearchField>
        <solrQueryParser defaultOperator="OR"/>
        <copyField source="name" dest="searchfield"/>
diwakarb
  • 543
  • 2
  • 9
  • 23
  • What is the field type of the field you have your cities in? Can it be that you're seeing a stemmed/processed version of the city name? Indexing the city name by itself to a separate field and then using that for suggestions might be better. – MatsLindh Nov 22 '16 at 14:07
  • I am indexing it on searchfield whose schema entry looks like this `` – diwakarb Nov 22 '16 at 14:10
  • A ngram field will generate loads of tokens, so my guess is that you're seeing suggestions based on partial tokens from that field. Use a regular stringfield or a keywordtokenized field to get exact suggestions. – MatsLindh Nov 22 '16 at 14:20
  • Unfortunately it didn't worked. with following solrconfig change`stringstring solr.IndexBasedSpellChecker./spellcheckernametrueorg.apache.lucene.search.spell.LevensteinDistance` – diwakarb Nov 22 '16 at 15:11
  • Have decided to go with FileBasedSpellChecker. I might try to use FileDictionaryFactory for getting answers for the second question. – diwakarb Nov 23 '16 at 17:39

0 Answers0