0

Seems Apache Lucene api is getting changed from every version. How can I get most frequent term from IndexReader of Apache lucene 6.4.0.

I saw Get highest frequency terms from Lucene index which is not useful with Apache Lucene 6.4.0

Community
  • 1
  • 1
Siva R
  • 427
  • 2
  • 8
  • 23

1 Answers1

2

That's the code that will work for Lucene 6.4. It finds the most frequent term across all fields, for finding most frequent term in the field adjust code respectively.

        IndexReader reader = DirectoryReader.open(dir);
        final Fields fields = MultiFields.getFields(reader);
        final Iterator<String> iterator = fields.iterator();

        long maxFreq = Long.MIN_VALUE;
        String freqTerm = "";
        while(iterator.hasNext()) {
            final String field = iterator.next();
            final Terms terms = MultiFields.getTerms(reader, field);
            final TermsEnum it = terms.iterator();
            BytesRef term = it.next();
            while (term != null) {
                final long freq = it.totalTermFreq();
                if (freq > maxFreq) {
                    maxFreq = freq;
                    freqTerm = term.utf8ToString();
                }
                term = it.next();
            }
        }

        System.out.println(freqTerm + " " + maxFreq);
Mysterion
  • 9,050
  • 3
  • 30
  • 52