1

In the configuration file solrconfig.xml, is there a parameter to adjust tolerance in order to have multiple suggestions even if the number of different letters between the query & the suggestion is large?

In my solrconfig.xml suggestion configuration, I have :

The spellcheck search component:

<lst name="spellchecker">
    <str name="name">default</str>
    <str name="field">title</str>
    <str name="classname">solr.DirectSolrSpellChecker</str>
    <!-- the spellcheck distance measure used, the default is the internal levenshtein -->
    <str name="distanceMeasure">internal</str>
    <!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
    <float name="accuracy">0.5</float>
    <!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
    <int name="maxEdits">2</int>
    <!-- the minimum shared prefix when enumerating terms -->
    <int name="minPrefix">1</int>
    <!-- maximum number of inspections per result. -->
    <int name="maxInspections">5</int>
    <!-- minimum length of a query term to be considered for correction -->
    <int name="minQueryLength">4</int>
    <!-- maximum threshold of documents a query term can appear to be considered for correction -->
    <float name="maxQueryFrequency">0.01</float>
    <!-- uncomment this to require suggestions to occur in 1% of the documents
    <float name="thresholdTokenFrequency">.01</float>
  -->
</lst>

<!-- a spellchecker that can break or combine words.  See "/spell" handler below for usage -->
<lst name="spellchecker">
    <str name="name">wordbreak</str>
    <str name="classname">solr.WordBreakSolrSpellChecker</str>
    <str name="field">title</str>
    <str name="combineWords">true</str>
    <str name="breakWords">true</str>
    <int name="maxChanges">10</int>
</lst>

And the /spell request handler:

<requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
        <str name="df">title</str>
        <!-- Solr will use suggestions from both the 'default' spellchecker
        and from the 'wordbreak' spellchecker and combine them.
        collations (re-written queries) can include a combination of
        corrections from both spellcheckers -->
        <str name="spellcheck.dictionary">default</str>
        <str name="spellcheck.dictionary">wordbreak</str>
        <str name="spellcheck">on</str>
        <str name="spellcheck.extendedResults">true</str>
        <str name="spellcheck.count">10</str>
        <str name="spellcheck.alternativeTermCount">1000</str>
        <str name="spellcheck.maxResultsForSuggest">5</str>
        <str name="spellcheck.collate">true</str>
        <str name="spellcheck.collateExtendedResults">true</str>
        <str name="spellcheck.maxCollationTries">10</str>
        <str name="spellcheck.maxCollations">5</str>
        <str name="spellcheck.onlyMorePopular">false</str>
    </lst>
    <arr name="last-components">
        <str>spellcheck</str>
    </arr>
</requestHandler>

My issue is that I'm always getting only one suggestion by query. For example, for the query renou, I want to get renault as a suggestion, even if there are other words which are closer.

Hakim
  • 3,225
  • 5
  • 37
  • 75

1 Answers1

2

For your case it is better to use Suggester instead of a Spellchecker, because the Spellchecker generates candidates only for 1 and 2 letter changes. Suggester returns words, which begins with your query. To suggest words with spelling changes, you should use FuzzySuggester.

Artem Lukanin
  • 556
  • 3
  • 15
  • I think I will stick with the `spellcheck` component because it's more adequate for the syntax checking. And, I think that this `suggester` is used for `autocompletion` more than for `syntax checking` using `shingles filter`. – Hakim Sep 19 '13 at 13:07
  • 1
    Yes, but your example is for the beginning of the word. If you expect the spellchecker to generate hypotheses for 4-letter distances in different parts of words, you cannot use spellchecker, as it only generates candidates for 1 and 2-letter distances. You can try to use [solr.DoubleMetaphoneFilterFactor](http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.DoubleMetaphoneFilterFactor) for your field you apply `DirectSpellChecker` to to get more than 2-letter distances. – Artem Lukanin Sep 26 '13 at 11:02