Seems Apache Lucene api is getting changed from every version. How can I get most frequent term from IndexReader of Apache lucene 6.4.0.
I saw Get highest frequency terms from Lucene index which is not useful with Apache Lucene 6.4.0
Seems Apache Lucene api is getting changed from every version. How can I get most frequent term from IndexReader of Apache lucene 6.4.0.
I saw Get highest frequency terms from Lucene index which is not useful with Apache Lucene 6.4.0
That's the code that will work for Lucene 6.4. It finds the most frequent term across all fields, for finding most frequent term in the field adjust code respectively.
IndexReader reader = DirectoryReader.open(dir);
final Fields fields = MultiFields.getFields(reader);
final Iterator<String> iterator = fields.iterator();
long maxFreq = Long.MIN_VALUE;
String freqTerm = "";
while(iterator.hasNext()) {
final String field = iterator.next();
final Terms terms = MultiFields.getTerms(reader, field);
final TermsEnum it = terms.iterator();
BytesRef term = it.next();
while (term != null) {
final long freq = it.totalTermFreq();
if (freq > maxFreq) {
maxFreq = freq;
freqTerm = term.utf8ToString();
}
term = it.next();
}
}
System.out.println(freqTerm + " " + maxFreq);