0

I'm trying to get a multi-phrased query to with a partial match. According to the JavaDoc for MultiPhraseQuery:

A generalized version of PhraseQuery, with the possibility of adding more than one term at the same position that are treated as a disjunction (OR). To use this class to search for the phrase "Microsoft app*" first create a Builder and use MultiPhraseQuery.Builder.add(Term) on the term "microsoft" (assuming lowercase analysis), then find all terms that have "app" as prefix using LeafReader.terms(String), seeking to "app" then iterating and collecting terms until there is no longer that prefix, and finally use MultiPhraseQuery.Builder.add(Term[]) to add them. MultiPhraseQuery.Builder.build() returns the fully constructed (and immutable) MultiPhraseQuery.

https://lucene.apache.org/core/6_6_0/core/org/apache/lucene/search/MultiPhraseQuery.html

I'm struggling with the part where it says:

...find all terms that have "app" as prefix using LeafReader.terms(String), seeking to "app" then iterating and collecting terms until there is no longer that prefix...

How does one seek over there terms? LeafReader.terms(String) gives you Terms which has an iterator method that gives you TermsEnum which you can seek with. I'm just not sure how extract matching terms using that?

Martinffx
  • 2,426
  • 4
  • 33
  • 60

1 Answers1

1

Sounds like you have a grasp on how to get the TermsEnum, so from there, just seek to the prefix you want to match using seekCeil, and then iterate through the TermsEnum until you get to one that doesn't match the prefix. For example:

Terms terms = MultiFields.getTerms(indexReader, "text");
TermsEnum termsEnum = terms.iterator();
List<Term> matchingTerms = new ArrayList<Term>();
termsEnum.seekCeil(new BytesRef("app"));
while (termsEnum.term().utf8ToString().startsWith("app")) {
    matchingTerms.add(new Term("text", termsEnum.term()));
    termsEnum.next();
}
System.out.println(matchingTerms);
femtoRgon
  • 32,893
  • 7
  • 60
  • 87