I'm using Lucene to index a set of sentences. My queries are with two "entities" and i create a proximity query like this:
"EntityA EntityB"~22
and i want to retrieve all the sentences that contains this two entities in maximum range of 22 characters. Now i want to use Lucene Highlighter to retrieve the words between the two entity. I am using a code like this for split the content in fragments but i don't know how to set the fragment in the precise point between the two entities.
for (int i = 0; i < numTotalHits; i++) {
int id = hits[i].doc;
Document doc = searcher.doc(hits[i].doc);
String text = doc.get("content");
TokenStream tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), id , "content", analyzer);
String[] frag = getFragmentsWithHighlightedTerms(analyzer, query, "content", text, 10, 10);
for (int j = 0; j < frag.length; j++) {
System.out.println((frag[j].toString()));
}
My aim so to retrieve the text inside the entity, for example:
entity1 --> Canada
entity2 --> Ottawa
sentence --> Natural Resources Canada, Canadian Forest Service, Ottawa.
result --> , Canadian Forest Service,