How to collapse duplicates in search results

Question

We use Hibernate Search 6 CR2 with Elasticsearch and Spring Boot 2.4.0. Is there any way to collapse duplicates in search results?

We tried to kind of "collapse" them like this:

    searchResults = searchSession.search(Items.class)
            .select(f -> f.field(field.getCode(), String.class))
            .where(f -> f.phrase()
                    .field(field.getCode())
                    .matching(phrase)
                    .slop(SLOP))
            .fetchHits(20)
            .stream()
            .distinct()
            .collect(Collectors.toList());

...but this method works only on small amount of results (less than fetchHits size) and when there are not so many identical hits. When we tried this method on another index with thousands hits (~28M docs) we saw that it's not working as expected because of fetchHits setting -- some search results that should be -- are lost. And of course, the main question here is that by using this method we don't distinct search results while searching, it happens after the original search, so it's not the best solution.

Another solution was found here but it's a bit outdated and not an actual answer for our question.

Over Hibernate Search forums we found another solution for similar task, we tried to implement it and it worked, but as a downsides we got 2x multiplication for index document fields (8 fields now instead of 4).

So after all, is it possible to tune HS to collapse duplicates in search results without help of these extra-fields? Or, if it's OK... Okay then! We'll remember this and use as a solution in future cases.

P.S.: we implement search-as-you-type prediction service so it's not necessary for original entities to be extracted.

score 2 · Accepted Answer · answered Dec 07 '20 at 13:58

The solution you linked is the most straightforward way to get a list of all values in matched documents for a given field. It is what aggregations are for.

Yes, it requires additional fields. Generally speaking, you can't get performance out of thin air: to get a smaller execution time, you need to use more memory.

That being said, if what you want is suggestions, you should probably have a look at Elasticsearch's suggester feature.

There is no API for this in Hibernate Search (yet), so you will have to transform JSON in order to leverage this feature. It's relatively easy, and you even have an example for your very use case in the reference documentation (have a look at the second example).

Of course if you really want to use phrase queries, it's going to be more complicated. I'd suggest you have a look at the phrase suggester or maybe the completion suggester.

Should you need to register a field with a type that is not supported out of the box by Hibernate Search (e.g. completion), it's possible too: you will just need a custom bridge. See this example.

How to collapse duplicates in search results

1 Answers1