Solr - configuration suggestions search substring

Question

I'm using solr 7.5 to do some suggestions with "/suggester" on categories. This is used for the autocomplete function with solr integration.

Indexed items:

"Roof"
"Roof Panels"
"Sandwich Panels"

Expected behaviour

Search: "roo" -> Result: "Roof" & "Roof Panels"

Search "pane" -> Result: "Roof Panels" & "Sandwich Panels"

Problems

I've tried several solutions with different tokenizers without any success.

StandardTokenizer returns single words

KeywordTokenizer return me the complete phrase but there I have the problem if I search for "panel" -> no suggested words. Would expect "Sandwich panels" & "Roof Panels"

ShingleFilterFactory gives me strange results if i search for "roof panel" -> it return "roof panels" / "roof roof panels" / "roof sandwich panels"

Latest configuration

Solr document:

"autosuggest_en":["Roof Panels",
      "Sandwich Panels",
      "Roof Panels",
      "Sandwich Panels"],

    "spellcheck_en":["Roof Panels",
      "Sandwich Panels",
      "Roof Panels",
      "Sandwich Panels"],

solrconfig.xml

<searchComponent name="suggest" class="solr.SpellCheckComponent">
    <str name="queryAnalyzerFieldType">text_spell</str>
    <lst name="spellchecker">
        <str name="name">default</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">AnalyzingInfixLookupFactory</str>
        <str name="suggestAnalyzerFieldType">text_spell</str>
        <str name="field">autosuggest</str>
        <str name="buildOnCommit">true</str>
        <str name="buildOnOptimize">true</str>
        <str name="accuracy">0.35</str>
    </lst>
</searchComponent>

schema.xml

<fieldType name="text_spell" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            <filter class="solr.ShingleFilterFactory" maxShingleSize="10"
                    outputUnigrams="true" outputUnigramsIfNoShingles="false" tokenSeparator=" " fillerToken="_"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            <filter class="solr.ShingleFilterFactory" maxShingleSize="10"
                    outputUnigrams="true" outputUnigramsIfNoShingles="false" tokenSeparator=" " fillerToken="_"/>
        </analyzer>
    </fieldType>

The solution above gives me following behaviour. search: "roof" -> results: "roof" & "roof panels" = Good

search: "roof pane" -> results: "roof panels" & "roof roof panels" = Not good. Don't know why it repeats twice "roof"

Any advice on a proper solution for the expected behaviour?

Thanks!

Best regards

score 0 · Answer 1 · answered Mar 19 '19 at 10:44

To examine the problem, you could do the following steps:

You should take a look at your synonyms.txt Is there any content insight? If yes, disable that file for a test.
Use the analysis feature at your solr admin page to find out, how solr handles the processing of your search term

Solr - configuration suggestions search substring

1 Answers1