1

I have SOLR 5.3.1 version. I want to show those documents at start which have more matching terms.

For this purpose, i have applied omitNorms= true on every field of schema. I have also implement Custom Similarity class. My similarity class look like this:

package org.apache.lucene.search.similarities;

import org.apache.lucene.index.FieldInvertState;



 public class MyDefaultSimilarity extends DefaultSimilarity{

    @Override
    public float idf(long docFreq, long numDocs) {
        return 0.5f;
    }

    @Override
    public float lengthNorm(FieldInvertState arg0) {
        return 0.5f;
    }

    @Override
    public float tf(float freq) {
        return 0.5f;
    }

    @Override
    public float coord(int overlap, int maxOverlap) {
        System.out.println("Coord:"+Math.pow(super.coord(overlap, maxOverlap),2));
        return (float)Math.pow(super.coord(overlap, maxOverlap),2);
    }

    }

I have made following changes in schema.xml for similarity class

 <fieldType name="text_general" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>

      //define custom similarity class there
      <similarity class="org.apache.lucene.search.similarities.MyDefaultSimilarity"> </similarity>
    </fieldType>

      //define global similarity class there
<similarity class="solr.SchemaSimilarityFactory"/>

I have made following changes in solrconfig.xml for similarity class

<lib dir="${solr.install.dir:../../../..}/dist/" regex="SimilaritySolr.*\.jar" />

I have debug the query. It is still showing score of every document equal to 1. Here is the debug query result shows that which parameter is effecting on score.

1.0 = *:*, product of:
  1.0 = boost
  1.0 = queryNorm

Please Let me know if any thing else which i have missed to boost the score of documents which have more matching terms?

Abdul Rauf
  • 763
  • 2
  • 8
  • 28
  • Well, you're returning each score as 1/1 (tf/idf), so the score will be .. 1. The thing you want sounds like the default similarity - is there any reason why you're using a custom similarity class? – MatsLindh Jan 07 '16 at 08:44
  • #MatsLindh. I want to disable all the factors except the coordination factor for scoring of documents. I want to boost coordination factor so that the documents which have more matching terms, should have more score then others. Then i can sort the documents by their score so that i could show higher matching terms documents first in score results. – Abdul Rauf Jan 07 '16 at 09:50
  • I also tried it by overriding the coord function only but results still same. – Abdul Rauf Jan 07 '16 at 10:04
  • Does the coord factor give you any value in the debug query at all? Have you tried just returning 2 or something similar from any of the fields to see that your similarity is running as you expect? (and not an older version) .. you're not using sort in any way i presume? Have you tried running the similarity through a debugger or adding a few log statement / prints to see what the returned values are? – MatsLindh Jan 07 '16 at 10:23
  • No the coord factor is not giving any value in debug query. I am going to try returning 2, and will share the results. I am sorting the documents by the scoring while query the data. Can you please let me know how i can try running the similarity through debugger? I have no idea about it. – Abdul Rauf Jan 07 '16 at 11:19
  • i have tried by returning 2. Still max score is 1 for all documents. – Abdul Rauf Jan 07 '16 at 11:55
  • Try 0.5, since values might be clamped to [0,1] - the easiest way is to just println while the similarity is running, as that should be caught by the logs (or use the logging interface). Debugging should be possible by starting the JVM with -Xdebug iirc and then connecting a debugger to the JVM. – MatsLindh Jan 07 '16 at 13:27
  • #MatsLindh, so i should try it only for idf, lengthNorm and tf. For coord it would be same? – Abdul Rauf Jan 07 '16 at 13:29
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/100055/discussion-between-abdul-rauf-and-matslindh). – Abdul Rauf Jan 07 '16 at 14:02

0 Answers0