Solr whitespace suggester/query analyser

Question

I would like to know if it is possible to have a whitespace suggester? I worked with the suggester but it only gives tokens results.

Example of what I'm looking for:

Indexed item: b123-456

This gets tokinized as B123 and 456. Now the user is looking for b123456. The search returns 0 results.(if it doesn't return 0 results this should not happen)

Now I would like to have a suggestion that recommends using b123 and 456 seperatly.

It splits the long alphanumeric on serval spots and then looks if a the token exists and if 2 or 3 tokens exist score it even higher.

I could write my own code thats splits the term up, but that would make thousands of queries to get some result.

Is there anything that shows this kind of behaviour?

Maybe if whitespace isn't possible due to the high amount of possibilities a suggester that leaves out special characters like "-","/","."

score 0 · Answer 1 · answered May 25 '14 at 17:35

The best way to do this is to configure the field type in the indexing and query phases accordingly in the "schema.xml".

That said, I would suggest you to do a bit of a research on how the index and query analyzing phases work in SOLR. My guess is that you should focus on the solr.WordDelimiterFilterFactory (see example below).

Here is one example that might be useful to start with. (No guarantees :-) )

Good luck!

<!-- A text field with defaults appropriate for English, plus
     aggressive word-splitting and autophrase features enabled.
     This field is just like text_en, except it adds
     WordDelimiterFilter to enable splitting and matching of
     words on case-change, alpha numeric boundaries, and
     non-alphanumeric chars.  This means certain compound word
     cases will work, for example query "wi fi" will match
     document "WiFi" or "wi-fi".
     -->
    <fieldType name="text_en_splitting" class="solr.TextField" positionIncrementGap="100" autoGeneratePhraseQueries="true">
        <analyzer type="index">
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

Solr whitespace suggester/query analyser

1 Answers1