6

I need to compare the relevance of the search results across different Lucene queries.

Actually I have an indexed set of text documents and when a search is done on this set I want to return not the N best results from this set but all the results which fit the query "good enough".

This "good enough" parameter will be configurable (say between 0 (document is absolutely irrelevant) and 1 (document is the best match possible)) but I want it to affect all queries in the same way.

From what I have found on the internet it is not a simple task. Could anybody give me a hint about how to approach this problem?

Thanks a lot!

Serpenty
  • 61
  • 2
  • Not sure what you mean? Do you want to threshold out query results? It is easy to do that with Solr. With Lucene you need to write a custom collector: look here http://stackoverflow.com/questions/2871558/remove-results-below-a-certain-score-threshold-in-solr-lucene – Mikos Jul 25 '11 at 01:13
  • Thanks Mikos, but as it is written in Shashikant Kore's comment there, scores are relative to queries and hence I can't use the same threshold for measuring "goodness" of results across multiple queries. I am looking into the way of normalizing the scores somehow so that these normalized values mean the same in terms of "goodness" for all queries. – Serpenty Jul 25 '11 at 07:37
  • ahh! I think I better understand your question, but feel that is more suited for statistics than Lucene per se. You might want to look up ANOVA or Chi-squared test in statistics while might help you determine goodness-of-fit across queries using the array of result document scores. HTH. – Mikos Jul 25 '11 at 08:00

2 Answers2

0

If you want to compare two or more queries, I found an workaround. You can compare your highest scored document with your queryterm using the LevenstheinDistance or LuceneLevenstheinDistance(Damerau) class to get the distance between your queryterm and your result.

The result is the similarity between them. Do this for each query you want to compare against. Now you have a tool to compare your queries using the similarity of your queryterm and your highest result. You can now choose the query with the highest score of similarity and use this for next proper actions.

//Damerau LevenstheinDistance
LuceneLevenshteinDistance d = new LuceneLevenshteinDistance();

similiarity = d.getDistance(queryterm, yourResult );
piyushj
  • 1,546
  • 5
  • 21
  • 29
0

I was just looking for the answer to this same question. Here's what I found in looking around:

While in general it is not possible to compare across queries, if you have certain restricted types of queries, such as a BooleanQuery consisting of only TermQuerys, then it may be possible to compare results across queries if you disable the coord boost in the BooleanQuery constructor.

Steve
  • 3,038
  • 2
  • 27
  • 46