0

we have a large synonym list. I use a manual analyzer to index the search field. The synonym list is annotated with the "SynonymGraphFilterFactory" filter. So far everything is good. When I do a search on the field, I get the matching result. Synonym list looks like this: car, vehicle

If I enter "car" in my search, the correct results are displayed and the word "car" is highlighted.

When I enter the word "vehicle" I get correct results but nothing is highlighted.

I would like to have both words highlighted in the search. "car" and "vehicle". Is that even possible?

So far I haven't found a suitable solution. Maybe someone can help me here.

Configurations: Hibernate-search 6, Lucene Higlighter 8.7

Code:

To index the search field, my analyzer looks like this:

context.analyzer ("myCustomAnalyzer"). custom ()
.tokenizer (StandardTokenizerFactory.class)
.tokenFilter (LowerCaseFilterFactory.class)
.tokenFilter (KeywordRepeatFilterFactory.class)
.tokenFilter (PorterStemFilterFactory.class)
.tokenFilter (TrimFilterFactory.class)
.tokenFilter (SnowballPorterFilterFactory.class) .param ("language", "German")
.tokenFilter (RemoveDuplicatesTokenFilterFactory.class)
.tokenFilter (SynonymGraphFilterFactory.class) .param ("synonyms", "synonyms / synonyms.properties")
.param ("ignoreCase", "true"). param ("expand", "true");

Highlighter method looks like this:

private Results highlighting(final Results results, final String mySearchString) {

        final SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("start", "end");
        final TermQuery query = new TermQuery(
            new Term("indexFieldName", mySearchString));
        final QueryScorer queryScorer = new QueryScorer(query, "indexFieldName");
        final Fragmenter fragmenter = new SimpleSpanFragmenter(queryScorer);
        queryScorer.setExpandMultiTermQuery(true);
        final Highlighter highlighter = new Highlighter(simpleHTMLFormatter, queryScorer);
        highlighter.setTextFragmenter(fragmenter);

        try (Analyzer analyzer = new StandardAnalyzer()) {
            for (final MyEntity my : results.getMyResults()) {
                for (final MySecondEntity sec : my.getMyDescriptions()) {

                    final String text = sec.getMyName();

                    try {
                        final TokenStream tokenStream = analyzer.tokenStream(
                            "indexFieldName", new StringReader(text));
                        final String result = highlighter.getBestFragments(
                            tokenStream, text,
                            sec.getMyName().length(), " ...");
                        if (!StringUtils.isBlank(result)) {

                            sec.setMyName(result);
                        }

                    } catch (final Exception e) {
                        LOG.warn(String.format(
                            "Failure during highlighting process for ..."...
                    }
                }
            }

        }
        return results;
    }

Thank you for your answers

melawa110
  • 3
  • 2

1 Answers1

1

I'm not overly familiar with highlighters, but one thing that seems suspicious in your code is the fact that you're using a StandardAnalyzer to highlight. If you want synonyms to be highlighted, I believe you need to use an analyzer that handles synonyms.

Try using the same analyzer for indexing and highlighting.

You can retrieve the analyzer instance from Hibernate Search. See this section of the documentation, or this example:

LuceneBackend luceneBackend =
        Search.mapping( entityManager.getEntityManagerFactory() )
        .backend().unwrap( LuceneBackend.class ); 
Analyzer analyzer = luceneBackend.analyzer( "myCustomAnalyzer" ).get(); 

Then use it instead of new StandardAnalyzer() in your highlighting code; just make sure you don't close this analyzer.

yrodiere
  • 9,280
  • 1
  • 13
  • 35
  • Hello again. Thank you for your help. In order to be able to highlight the synonyms, I had to adjust the highlighter method. That was my solution. You were also right - you should always use the same analyzer. Thank you very much :) – melawa110 Jan 26 '21 at 07:20
  • No problem. Feel free to mark the solution as accepted if it solved your problem, or to post your own solution, so that it can help others ;) – yrodiere Jan 26 '21 at 08:04
  • @yrodiere - what if we are using Hibernate Search 6 to Elasticsearch and need to retrieve analyzers? – Rick Gagne Mar 10 '22 at 00:48
  • @RickGagne You can't. And I don't mean "Hibernate Search does not support that", I mean "it doesn't make sense". With Elasticsearch, analyzers live in your Elasticsearch cluster, in separate JVMs, and Apache Lucene may not even be in your application's classpath. The best you can do is to add Apache Lucene to your classpath an try to approximate the Elasticsearch analyzers locally with Lucene, but Hibernate Search won't be able to help you with that. – yrodiere Mar 11 '22 at 08:19
  • @RickGagne If you just want to do highlighting, however, you can use native Elasticsearch features: see https://stackoverflow.com/a/50851926/6692043 – yrodiere Mar 11 '22 at 08:22
  • @yrodiere I'm now migrating from HS5 to HS6 and running into similar issues as a couple years back where you helped immensely. Trying to highlight, or basically, compare, 2 strings, that are not necessarily indexed, for matches using defined custom analyzers. I'm ok using Lucene directly and just approximating if that works. In HibernateSearch 5.x there were some difficulties getting the same analyzers defined and working for Lucene as for Elasticsearch. Some loopholes you helped with. Is it more straightforward now? – Rick Gagne Mar 11 '22 at 16:04
  • I'm not sure what you're trying to do, but some things related to analyzers are more straightforward in Hibernate Search 6, yes (in particular search-only analyzers). Working with both Lucene and Elasticsearch in the same app... it depends. As to your particular problem, I'd say a separate question with full context and explanations would be in order. – yrodiere Mar 14 '22 at 08:23