4

My custom similarity class PercentageSimilarityClass has been added to the classloader, but the results are ranked the same as before.

This is my code. What am I doing wrong?

package org.apache.lucene.search.similarities;
import org.apache.lucene.search.similarities.DefaultSimilarity;

public class PercentageSimilarityClass extends DefaultSimilarity {

  @Override
  public float coord(int overlap, int maxOverlap) {
    return ((overlap /(float)maxOverlap)*(overlap/(float)maxOverlap));
  }

  @Override
  public float queryNorm(float sumOfSquaredWeights) {
    return (float) 1.0;
  }

  @Override
  public float tf(float freq) {
    return (float) 1.0;
  }

  @Override
  public float sloppyFreq(int distance) {
    return (float) 1.0;  
  }

  @Override
  public float idf(long docFreq, long numDocs) {
    return (float) 1.0;
  }
}

I also tried to add in

public PercentageSimilarityClass(){
    super();
}

but it didn't make a difference.

Any help would be greatly appreciated!

Edit

I want Solr to rank documents based on how many of the query's words are found in the document; the more words the higher the ranking.

So, I am trying to increase the weighting of the coord() factor, (by squaring it), and decrease the other factors, (by having them return (float) 1.0).

I've calculated what percentage of each returned document is made up of the query's words and, both before and after I added my custom similarity, my top ten ranked documents got a percentage of

21.74%

12.5%

15.38%

27.59%

10.34%

44.44%

37.5%

14.29%

19.3%

20.0%

The document that is 44.44% comprised of the query's words should have ranked first in this instance, and when I extend the search beyond 10 documents, to 100 or 500 documents, I get many documents that are 70%+ comprised of words in the query term, which have not been ranked first.

AnonyMouse
  • 432
  • 8
  • 24
  • You should provide sample inputs, expected outputs and actual outputs for this sort of problem. In addition to this, how is this being invoked? – Nathaniel Ford Aug 16 '14 at 17:03
  • 1
    Have you added your Similarity implementation in the schema.xml file? (https://wiki.apache.org/solr/SolrPlugins#Similarity) – spyk Aug 16 '14 at 20:07
  • As spyk points out, you need to configure it in your schema.xml and before implementing a similarity on your own, have you checked [the other similarities provided with Solr](http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/search/similarities/package-summary.html)? – cheffe Aug 17 '14 at 09:14
  • Yeah, it's been implemented in schema.xml as `` and it states it has been added to the classloader on Solr startup, so is there something wrong with the code itself? – AnonyMouse Aug 17 '14 at 12:40
  • You say that your new Similarity implementation did not make any difference on the ranking. Have you checked the score for each retrieved doc and it is the same as before? – spyk Aug 18 '14 at 18:50
  • Hey, thanks for the reply! Turns out that the coord had been returning 1.0 and I had been squaring coord to "make the factor bigger", thus keeping it at 1.0 with no change. – AnonyMouse Aug 18 '14 at 18:54
  • #AnonyMouse.. if you solved this problem. Please share it with me. I am facing the same problem. – Abdul Rauf Jan 06 '16 at 14:03

0 Answers0