1

I am using solr for spell checking/ query correction. I have added solr.PhoneticFilterFactory and solr.NGramFilterFactory in fieldType to perform spell checking. It is working fine but here the problem is that I am getting number of documents of the query. I need only most likely words/documents or in similar words, we can say that nearer words/documents to the query.

Snippet of schema.xml :

<fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">         
        <filter class="solr.TrimFilterFactory"/>        
        <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="1000" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>        
        <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
    </analyzer>
    <analyzer type="query">
        <filter class="solr.TrimFilterFactory"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>        
        <filter class="solr.LowerCaseFilterFactory"/>       
        <filter class="solr.PhoneticFilterFactory" encoder="DoubleMetaphone" inject="true"/>
    </analyzer>
</fieldType>

Example : For a query "piece". I am getting around 780 NumFound(Number of documents). I need to reduce this counts but with most likely number of documents.

iNikkz
  • 3,729
  • 5
  • 29
  • 59
  • Any reason why you're using _both_ ngram and phonetic? That will result in almost every document matching every (shortish) query. You might also want to test a different phonetic encoder. – MatsLindh Dec 15 '14 at 13:48
  • `@ MatsLindh:` I tried with **different phonetic encoder** but I think **DoubleMetaphone encoder** is good among all. There is any something relevant to **threshold** by which I can get only the **most popular terms/documents** for the query. – iNikkz Dec 16 '14 at 05:21
  • @iNikkz can you share which all terms get matched with `piece`? – sidgate Dec 22 '14 at 10:25
  • @sidgate: I am getting words like 'preace, place, peace, etc. there are 780 words. – iNikkz Dec 22 '14 at 11:17
  • @iNikkz - Were you able to get a solution here?? – Amartya Apr 16 '16 at 16:00

0 Answers0