Find count of given words in Java Lucene library

Question

In the tutorial of Lucene ( http://www.lucenetutorial.com/lucene-in-5-minutes.html ),

public class HelloLucene {
public static void main(String[] args) throws IOException, ParseException {
IndexWriter w = new IndexWriter(index, config);
         addDoc(w, "Lucene lucene in Action");
         addDoc(w, "Lucene for Dummies");
         addDoc(w, "Managing Gigabytes");
         addDoc(w, "The Art of Computer Science");
         w.close();
String querystr = args.length > 0 ? args[0] : "lucene";
//...
 }
}

When I change as indicated above the string to "Lucene lucene in Action", then search the doc for keyword "lucene", it finds the number of hits 1 for the string "Lucene lucene in Action". I want to send a string (e.g. "asd asd fds asd") to function and search for "asd" and find the result 3. How can I do that by using the query addDoc(w, "asd asd fds asd"); ???

It does not giving the number of hits in selected line. It writes 1 if there is a hit or hits, and 0 if there is not.

score 1 · Answer 1 · edited May 23 '17 at 12:04

I believe what you're looking for is the calculation of term vector frequencies.

See this question on them - How to count term frequency for set of documents?

And this - Get highest frequency terms from Lucene index

If I understand the question, you're asking how to count the number of times that a input phrase (e.g. 'asd') occurs within the documents in your index. In that case you'll need to calculate the term vector frequencies and compare your search query to determine if there is a match and corresponding count of occurrences. Keep in mind that this will help match entire words and is not designed to be used for a full text proximity search for terms within a corpus of indexed documents.

score 1 · Answer 2 · answered Oct 15 '12 at 19:48

I suspect you may be misunderstanding something in your example.

I don't see anything in there in which the example is collecting the number of matching terms in the matching document. Perhaps the authors use of the word is 'hits' is muddying things somewhat.

The hits variable there stores the matching document ids and scores in a collection of ScoreDocs. The hits[index].score is the most appropriate thing to look at to determine how strong a match the document is.

Find count of given words in Java Lucene library

2 Answers2