1

In our application we use Hibernate Search + Elasticsearch together as search engine for end users. Configuration in search service (let's call it XXXsearch, written in java) looks as below:

spring:
  profiles: elasticsearch
  elasticsearch:
    rest:
      uris: "http://${XXX_ELASTICSEARCH_HOST:localhost}:${XXX_ELASTICSEARCH_PORT:9200}"
  jpa:
    properties:
      hibernate:
        search:
          default:
            indexmanager: elasticsearch
            elasticsearch:
              host: "http://${XXX_ELASTICSEARCH_HOST:localhost}:${XXX_ELASTICSEARCH_PORT:9200}"
              index_schema_management_strategy: "${XXX_ELASTICSEARCH_SCHEMA_MANAGEMENT_STRATEGY}"
              required_index_status: green

Data for each record in our app is saved in relational DB (Oracle) and it's propagated to Elasticsearch by Hibernate Search.

To quickly explain our problem: we had a problem with concurrent commits by two pods (Kubernetes) of search service (XXXsearch) - data used to be overwritten not-in-sequence when commited nearly at the same time (milliseconds of difference). We added versioning to records in database:

@Field(index = Index.YES, analyze = Analyze.NO)
@Version
private int version;

and it solved problem with overwriting data in Oracle DB. However, when search is done, for instance, by parameter/field status and it's value 'A', search query returns scores with status 'A', 'B' and 'C'. Data records in Oracle DB are up-to-date, so I think the problem is that Hibernate Search updates Elasticsearch's indexes in batches, therefore there is possibility of not-in-sequence updates.

At this moment, my best idea of resolving this problem is to use Elasitcsearch's versioning, but I cannot find any information how to configure this in Hibernate Search. I have only found such configuration in Spring Data's documentation.

1 Answers1

0

I don't know if it's your case, but when using @IndexedEmbedded on associations, the document ends up spanning multiple entities, and that can lead to problems similar to what you are experiencing. This is a known limitation, and it is documented here.

Unfortunately, there is (generally) no way to reliably map the DB version (@Version) of the multiple source entities involved to a document version, so Elasticsearch versioning would not help.

This is why Elasticsearch versioning is not currently implemented in Hibernate Search: it would only work in very specific scenarios where you map exactly one entity to exactly one document, which is possible, but not very common. Implementing this would not solve the problem completely, while giving a false sense of security, which is why we decided to not do it, and to work on a better solution.

Instead, Hibernate Search 6.1 (currently in Alpha) is introducing the concept of coordination between nodes, so that a given entity is never indexed concurrently, solving the problem once and for all.

You can find more information here: https://docs.jboss.org/hibernate/search/6.1/reference/en-US/html_single/#coordination-database-polling

(Be careful about the configuration properties, there are typos in the documentation at the moment, hibernate.search.backend.coordination.* should actually be hibernate.search.coordination.*; we'll fix that in the next release)

For now only static sharding (fixed number of application nodes) is supported, but dynamic sharding (with automatic rebalancing) is being worked on.

In the meantime, if you have an auto-scaling application cluster, you can have a fixed subset of your nodes (e.g. 4 "primary" nodes, always up) perform automatic indexing, while other nodes only collect entity change events and don't perform automatic indexing themselves.

Eventually we aim to provide alternative coordination strategies that do not rely on additional tables in the database (e.g. using Debezium), but that will come later.

yrodiere
  • 9,280
  • 1
  • 13
  • 35