0
<analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="15"/>
</analyzer>

Above is my query ^

Search ==> Saree

Results of EdgeNGramFilterFactory ==> [sar, sare, saree]

then the searching happens ..but products title (boost field) having value sar appears first in the list...but I want results of saree first and then remaining word

what I want ==> [saree, sare, sar]

Can anyone suggest how can I get above results. Thanks.

edit1

rawquerystring": "saree", "querystring": "saree",

"parsedquery": "(+DisjunctionMaxQuery(((category_name_textv:sar category_name_textv:sare category_name_textv:saree) | ((title_textv:sar title_textv:sare title_textv:saree)^0.24) | (product_id_text:sar product_id_text:sare product_id_text:saree) | ((specification_textv:sar specification_textv:sare specification_textv:saree)^0.5) | ((description_textv:sar description_textv:sare description_textv:saree)^0.5)))

I printed values of deubg=all as pointed by @MatsLindh

how can I change patter of highlighted part

(category_name_textv:sar category_name_textv:sare category_name_textv:saree)

to

(category_name_textv:saree category_name_textv:sare category_name_textv:sar)

Please suggest to achieve the above pattern. Thanks in advance.

vc2
  • 1
  • 2
  • 1
    The sequence of the generated tokens doesn't matter for scoring. Just that a match against a specific token happened. If you append `debug=all` to your query, you should be able to see how the score is calculated for your input query; the token length isn't considered as a scoring element - but a longer token should occur more seldom, so depending on when the exact score is calculated for the token match, it might differ. But it might not be enough depending on what your boost values are and the score of your document. `debug=all` might tell you more. – MatsLindh Jun 26 '20 at 20:11
  • yes I got info by adding debug=all @MatsLindh...but still how to I change this searching .. I have added info in question..it would help if you can elaborate on same – vc2 Jun 26 '20 at 21:25
  • It's the scoring part for the documents that have the wrong order that would be interesting - it should show how much each of the terms are contributing to the score. The sequence they're presented in in your query will _not_ affect score in any way, so "switching them around" in the presentation wouldn't change anything. – MatsLindh Jun 27 '20 at 04:36
  • okk..so how should i get this resutls..like where should be I making changes to get **saree** _first_ and then *sar* @MatsLindh – vc2 Jun 27 '20 at 15:54
  • The `debug=all` part should give you detailed information about how the score is calculated for a specific document; you should be able to see exactly how much the `saree` hit is contributing to the score - that's a decent starting point. Additionally; you probably don't want to perform ngramming when querying, only when indexing - otherwise you'll get hits for any prefix of your search string (so if you've indexed `saree` and you're searching for `sarbaraksd`, you'll still get a hit). If you do not have ngramming when indexing, you won't get any relevant difference in scores. – MatsLindh Jun 27 '20 at 16:54
  • but ngramming while indexing will spoil my indexes because sar,sare withh be indexed ..but i want saree to be indexed. But while search of saree , title habving sar is coming up which i want to prevent ...so i want saree,sare,sar in this fashion @MatsLindh – vc2 Jun 28 '20 at 20:32
  • You do one field for additional scoring - one where you have index time ngram expansion, effectively making tokens appear in fewer documents when they're longer - this is the one you use for scoring - for example by using it as a boost field. You can then use the field you have today for filtering as a fq (or as the query). – MatsLindh Jun 28 '20 at 20:52
  • **(title_textv:sar title_textv:sare title_textv:saree)^0.24) ** I have boosted title by 4.0, so the product containing the title "---- sar --- --- " is appearing in my search which I want but to be shown on later pages, but I want title containing '--- saree --- --' first than sar @MatsLindh – vc2 Jun 29 '20 at 05:05
  • What doesn't work about the strategy I described? i.e. a separate field with index time expansion of edgengrams, so that the token weights map what you want? – MatsLindh Jun 29 '20 at 08:27
  • yes I dont want to index using ngraming – vc2 Jul 01 '20 at 07:40
  • Then you won't get consistent scoring - if `sar` is a less frequent term than `saree`, then `sar` will contribute more to scoring than `saree` - and those documents will be sorted above `saree`. `debug=all` will tell you how the score is calculated for each document for each of those terms. – MatsLindh Jul 01 '20 at 08:21
  • but in indexing `saree` will be broken to `sar, sare,saree` then how come both will contribute differently as all will have same priority? – vc2 Jul 01 '20 at 10:33
  • Because there will be far more occurences of `sar` in the index than of `saree`. The score from each matching token is calculated as the BM25 score - but to visualized it easier, the traditional TF/IDF score is more helpful - i.e. how many times a term occur in a document, divided by how many times it occurs in the index in total. If the number of times it occur in the index is larger, the score is lower. If it's more seldom - i.e. more unique - then it gets scored higher. The longer term will be present in fewer documents, and thus, contribute a higher score. There is no "priority". – MatsLindh Jul 01 '20 at 11:38

0 Answers0