My custom similarity class PercentageSimilarityClass
has been added to the classloader, but the results are ranked the same as before.
This is my code. What am I doing wrong?
package org.apache.lucene.search.similarities;
import org.apache.lucene.search.similarities.DefaultSimilarity;
public class PercentageSimilarityClass extends DefaultSimilarity {
@Override
public float coord(int overlap, int maxOverlap) {
return ((overlap /(float)maxOverlap)*(overlap/(float)maxOverlap));
}
@Override
public float queryNorm(float sumOfSquaredWeights) {
return (float) 1.0;
}
@Override
public float tf(float freq) {
return (float) 1.0;
}
@Override
public float sloppyFreq(int distance) {
return (float) 1.0;
}
@Override
public float idf(long docFreq, long numDocs) {
return (float) 1.0;
}
}
I also tried to add in
public PercentageSimilarityClass(){
super();
}
but it didn't make a difference.
Any help would be greatly appreciated!
Edit
I want Solr to rank documents based on how many of the query's words are found in the document; the more words the higher the ranking.
So, I am trying to increase the weighting of the coord()
factor, (by squaring it), and decrease the other factors, (by having them return (float) 1.0
).
I've calculated what percentage of each returned document is made up of the query's words and, both before and after I added my custom similarity, my top ten ranked documents got a percentage of
21.74%
12.5%
15.38%
27.59%
10.34%
44.44%
37.5%
14.29%
19.3%
20.0%
The document that is 44.44% comprised of the query's words should have ranked first in this instance, and when I extend the search beyond 10 documents, to 100 or 500 documents, I get many documents that are 70%+ comprised of words in the query term, which have not been ranked first.