I'm using solr 7.5 to do some suggestions with "/suggester" on categories. This is used for the autocomplete function with solr integration.
Indexed items:
- "Roof"
- "Roof Panels"
- "Sandwich Panels"
Expected behaviour
Search: "roo" -> Result: "Roof" & "Roof Panels"
Search "pane" -> Result: "Roof Panels" & "Sandwich Panels"
Problems
I've tried several solutions with different tokenizers without any success.
StandardTokenizer returns single words
KeywordTokenizer return me the complete phrase but there I have the problem if I search for "panel" -> no suggested words. Would expect "Sandwich panels" & "Roof Panels"
ShingleFilterFactory gives me strange results if i search for "roof panel" -> it return "roof panels" / "roof roof panels" / "roof sandwich panels"
Latest configuration
Solr document:
"autosuggest_en":["Roof Panels",
"Sandwich Panels",
"Roof Panels",
"Sandwich Panels"],
"spellcheck_en":["Roof Panels",
"Sandwich Panels",
"Roof Panels",
"Sandwich Panels"],
solrconfig.xml
<searchComponent name="suggest" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_spell</str>
<lst name="spellchecker">
<str name="name">default</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="suggestAnalyzerFieldType">text_spell</str>
<str name="field">autosuggest</str>
<str name="buildOnCommit">true</str>
<str name="buildOnOptimize">true</str>
<str name="accuracy">0.35</str>
</lst>
</searchComponent>
schema.xml
<fieldType name="text_spell" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="10"
outputUnigrams="true" outputUnigramsIfNoShingles="false" tokenSeparator=" " fillerToken="_"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="10"
outputUnigrams="true" outputUnigramsIfNoShingles="false" tokenSeparator=" " fillerToken="_"/>
</analyzer>
</fieldType>
The solution above gives me following behaviour. search: "roof" -> results: "roof" & "roof panels" = Good
search: "roof pane" -> results: "roof panels" & "roof roof panels" = Not good. Don't know why it repeats twice "roof"
Any advice on a proper solution for the expected behaviour?
Thanks!
Best regards