0

Analysis with Index [![][1]

In my solr, i get this result after running analysis for Indexing. I have a number of documents containing the word Machine Learning but seems like something broke and indexing chain didn't complete. Can i find a work-around for this?

Field type is for the value being searched is: <field name="Skills" type="text_general" indexed="true" stored="true"/>

EDIT 1:

Analysis with Query: Analysis with Query

Kabhi
  • 135
  • 1
  • 12

2 Answers2

0

I'm guessing that the "SF" is a Stemming filter - the filter will remove common endings to allow 'machine' to match 'machines', storing 'machin' as the common term in the index. As long as stemming is performed both when indexing and when querying, you should get the result you're looking for.

The EdgeNGramFilter stores a token for each extra letter in the token, so you get a token (that will match a query token) for each additional letter (where your filter seems to be configured for 3 as the minimum ngram size).

If you're not performing stemming when searching as well, the query machine will not find any terms matching, since the token after indexing has been stored as machin.

Use both the "query" and "index" section on the analysis page to see how each part is parsed and processed, and see why they don't end up with the same terms on both sides (the end tokens on both sides are compared, and if they're the same, there's a match - this is shown with a slightly darked background in the interface IIRC).

MatsLindh
  • 49,529
  • 4
  • 53
  • 84
  • i added the query section also. SF here has the full words. – Kabhi Feb 17 '17 at 12:12
  • But are both those SFs stemfilters? synonymfilters? You can see the full class name if you hover over the "SF" text. You might also want to have the lowercasefilter in the same position as earlier. Adding the field definition will also be useful, but as you can see - when querying the token is 'machine', while 'machin' is the token resulting from indexing. Since those doesn't match, you don't get a hit. – MatsLindh Feb 17 '17 at 15:01
0

I am not sure what's your first image stands for, but your two image shows different token filter order.

As a side note of the Stem filter, The kstem token filter is a high performance filter for english. All terms must already be lowercased (use lowercase filter) for this filter to work correctly.

Your first image shows you have LCF (LowercaseFilter) as the first token filter. But your second image shows you have stem filter run first, then do the LCF (LowercaseFilter), it is not going to work

XL Zheng
  • 363
  • 1
  • 6
  • 14