0

I'm looking for algorithms for determine and select in some texts areas, that will be relevant some user query. Maybe select snippet, that relevant for user query in text.

Can anybody reccomend any algorithms that suitable for this task?

P.S. I saw this question: Is there an algorithm for determining the relevance of a text to a theme? But it isn't a solution for my question, because I need in select relevant area in text, and using machine learning algorithms is not suitable for this task.

Community
  • 1
  • 1
Simplex
  • 1,723
  • 2
  • 17
  • 26

1 Answers1

2

You can use Lucene Highlighter for this. The highlight package of Lucene contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages.

The Highlighter class is the central component and can be used to extract the most interesting sections of a piece of text and highlight them, with the help of Fragmenter, fragment Scorer, and Formatter classes. The method getBestTextFragments of the Highlighter class selects the most likely relevant text from a document.

A sample snippet:

 Highlighter highlighter = new Highlighter(htmlFormatter, new QueryScorer(query));
 for (int i = 0; i < 10; i++) {
    int id = hits.scoreDocs[i].doc;
    Document doc = searcher.doc(id);
    TokenStream tokenStream = TokenSources.getAnyTokenStream(searcher.getIndexReader(), id, "body", analyzer);
    TextFragment[] frag = highlighter.getBestTextFragments(tokenStream, text, false, 10);
    ...
    ...
Debasis
  • 3,680
  • 1
  • 20
  • 23