3

How do you find a similar documents of a given document in Lucene. I do not know what the text is i only know what the document is. Is there a way to find similar documents in lucene. I am a newbie so I may need some hand holding.

javanna
  • 59,145
  • 14
  • 144
  • 125
Luke101
  • 63,072
  • 85
  • 231
  • 359

1 Answers1

10

you may want to check the MoreLikeThis feature of lucene.

MoreLikeThis constructs a lucene query based on terms within a document to find other similar documents in the index.

http://lucene.apache.org/java/3_0_1/api/contrib-queries/org/apache/lucene/search/similar/MoreLikeThis.html

Sample code example (java reference) -

MoreLikeThis mlt = new MoreLikeThis(reader); // Pass the index reader
mlt.setFieldNames(new String[] {"title", "author"}); // specify the fields for similiarity

Query query = mlt.like(docID); // Pass the doc id 
TopDocs similarDocs = searcher.search(query, 10); // Use the searcher
if (similarDocs.totalHits == 0)
    // Do handling
}
Jayendra
  • 52,349
  • 4
  • 80
  • 90
  • Oh yes..The morelikethis feature I already know about but how do I use it to calculate similar documents if only the document id is known? Actually, Im using lucene.net but there are very small differences in the implementation of lucene – Luke101 Oct 05 '11 at 14:25
  • added a sample set of code with java as reference. you may want to check the .net implementation – Jayendra Oct 05 '11 at 14:52