17

I've looked through a ton of examples and other questions here and from them, I've got my config very close to what I need but I'm missing one last little bit that I'm having a heck of a time working out. I'm searching on values like:

solar powered
solar glass
solar globe
solar lights
solar magic
solid brass
solid copper

What I want:

  1. If I search for sol the result should include all these values. This works.
  2. If I search for solar I should get just the first five. This works.
  3. If I search for solar gl I should get only solar glass and solar globe. This does not work. Instead, I get one set of matches for solar and a second set of matches for gl.

In a nutshell, I want to consider the input string as a whole, regardless of any whitespace. I gather this is accomplished by creating a separate query (versus index) analyzer, but I've not been able to make it work. Can anyone suggest a configuration that will get me what I'm looking for?

I've (unsuccessfully) tried:

  • Querying with "solar gl"
  • Querying with mm=100%
  • Defining separate query and index analyzers both using KeywordTokenizerFactory. (I don't know what the heck I thought that would do.)
  • Defining an index analyzer but not a query analyzer.
  • Defining a query analyzer with no tokenizer.

Here's my current schema:

<field name="suggest_phrase" type="suggest_phrase"
    indexed="true" stored="false" multiValued="false" />

And the field definition:

<fieldType name="suggest_phrase" class="solr.TextField" positionIncrementGap="100">
    <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory" />
        <filter class="solr.LowerCaseFilterFactory" />
    </analyzer>
</fieldType>

And the config:

<searchComponent name="suggest_phrase" class="solr.SpellCheckComponent">
    <lst name="spellchecker">
        <str name="name">suggest_phrase</str>
        <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
        <str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
        <str name="field">suggest_phrase</str>
        <str name="buildOnCommit">true</str>
    </lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest_phrase">
    <lst name="defaults">
        <str name="spellcheck">true</str>
        <str name="spellcheck.dictionary">suggest_phrase</str>
        <str name="spellcheck.onlyMorePopular">true</str>
        <str name="spellcheck.count">10</str>
        <str name="spellcheck.collate">false</str>
    </lst>
    <arr name="components">
        <str>suggest_phrase</str>
    </arr>
</requestHandler>
peterh
  • 11,875
  • 18
  • 85
  • 108
Alex Howansky
  • 50,515
  • 8
  • 78
  • 98
  • Did you try my solution? – Maurizio In denmark Aug 19 '13 at 19:35
  • Add the `shingle filter` to your field type [Shingles Filter fieldType](http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory). – Hakim Sep 19 '13 at 12:59
  • @h4kim Ok, just tried this but it doesn't give me what I'm looking for. If I query for `green coffee` I still get back two sets of matches, one for the word `green` and a separate one for the word `coffee`. (These individual match sets then contain the "shingled" terms.) What I'm looking for is to get a list of only the documents that contain the exact string `greencoffee` and not `"green" OR "coffee"`. – Alex Howansky Sep 19 '13 at 16:11

3 Answers3

17

Found the answer, finally! I knew I was really close. Turns out my configuration above was correct and I simply needed to change my query.

  1. Use KeywordTokenizerFactory so that the strings get indexed as a whole.
  2. Use SpellCheckComponent for the request handler.
  3. The piece I was missing -- don't query with q=<string> but with spellcheck.q=<string>.

Given the source strings noted above and a query of spellcheck.q=solar+gl this yields the desired results:

solar glass
solar globe
Alex Howansky
  • 50,515
  • 8
  • 78
  • 98
  • what if you query 'spellcheck.q=glass' ? – Maurizio In denmark Sep 20 '13 at 06:54
  • I'll get strings that start with glass: glass cleaner, glass bottle, glass window. – Alex Howansky Sep 20 '13 at 13:22
  • @AlexHowansky Which version of Solr are you using and do you mind posting the schema.xml and solrconfig.xml? Thanks. – xelber Oct 16 '13 at 22:42
  • This has been used successfully with v4.4 and v4.5. The schema and config are currently as noted in the original post. I only had to change the query string to get it to work. – Alex Howansky Oct 16 '13 at 23:47
  • @AlexHowansky : Elegant. I found people making changes in the classpath for doing this. – nish Nov 27 '13 at 09:52
  • 1
    @AlexHowaonnsky: thanks ! it helped a lot. one question: is it possible to search for spellcheck.q=gla and get result as solar glass – Bhuvan Mar 17 '15 at 10:18
  • Hmm good question. I'm not sure offhand and my test setup is long gone, so I can't check easily, but I would guess that, yes, it will return that result because you're not using anchors. – Alex Howansky Mar 17 '15 at 15:09
  • Hey Alex, I'm struggling with some solr autocomplete integration [http://stackoverflow.com/questions/39843424/apache-solr-search-autocomplete]. can you please help me to figure it out. – batMask Oct 11 '16 at 16:32
  • Also I believe you could have just changed the field type class to string – Ben Call May 15 '18 at 13:07
  • can anyone provide any example for the same? – Pradip Vadher Mar 20 '20 at 06:53
2

You may use the AnalyzingInfixLookupFactory or FreeTextLookupFactory

  • AnalyzingInfixLookupFactory returns the entire content of the field.
  • FreeTextLookupFactory returns a defined number of tokens.

More details and other suggester algorithms you will find here: http://alexbenedetti.blogspot.de/2015/07/solr-you-complete-me.html

Solr Configuration

<lst name="suggester">
  <str name="name">AnalyzingInfixSuggester</str>
  <str name="lookupImpl">AnalyzingInfixLookupFactory</str> 
  <str name="dictionaryImpl">DocumentDictionaryFactory</str>
  <str name="field">title</str>
  <str name="weightField">price</str>
  <str name="suggestAnalyzerFieldType">text_en</str>
</lst>

<lst name="suggester">
  <str name="name">FreeTextSuggester</str>
  <str name="lookupImpl">FreeTextLookupFactory</str> 
  <str name="dictionaryImpl">DocumentDictionaryFactory</str>
  <str name="field">title</str>
  <str name="ngrams">3</str>
  <str name="separator"> </str>
  <str name="suggestFreeTextAnalyzerFieldType">text_general</str>
</lst>
Matthias M
  • 12,906
  • 17
  • 87
  • 116
0

I've tried this many times and I came to the conclusion that is not possible out of the box. I found a workaround for that:

I indexed the data adding sopecial chars between each word so that they would not be tokenized. For example:

solarzzzzzzpowered
solarzzzzzzglass
solarzzzzzzglobe

then when you compose your query you make sure you add the same amount of chars between the two words you type, for example solr gl become solarzzzzzzgl.

This will achieve the behavious that you are asking.

Another option would be not to use the autosuggestion field and make a custom field for yourself, but then you will have to manage the wildcard search and all the indexation by yourself and is not too convenient in terms of time and performance.

Maurizio In denmark
  • 4,226
  • 2
  • 30
  • 64
  • 1
    Hi Maurizio, pinging you so you see my answer above -- finally got it to work and thought you'd be interested in the result. Cheers. – Alex Howansky Sep 19 '13 at 19:46