I'm working on a solution in which we have two databases that are used with the following purposes:
- An Elasticsearch used for search purposes
- A Postgres database that acts as a source of truth for the data
Our application allows users to retrieve and update products, and a product has multiple attributes: name, price, description, etc... And two typical use cases are:
- Retrieve products by name: a search is performed using elasticsearch, and then the IDs retrieved by ES are used on a secondary query against Postgres to obtain the actual and trustworthy data (so we get fast searches on big tables while getting trustworthy data)
- Update product fields: We allow users to update any product information (kind of a collaborative wiki). First we store the data in Postgres, and then into Elasticsearch.
However and as I feared and as the amount of people using the app increased, we've run into race conditions; if a user #1 changed the name of a product to "Banana" and then user #2 at the same time changed the name of a product to "Apple", sometimes in elasticsearch the last record saved would be "Banana" while in Postgres "Apple" would be the last value, creating a serious inconsistency between databases.
So I've ventured into reading about optimistic/pessimistic locking in order to solve my problem, but so far all articles I find relate when you only use 1 relational database, and the solutions offered rely on ORM implementations (e.g. Hibernate). But our combined storage solution of ES + Postgres requires more "ballet" than that.
What techniques/options are available to me to solve my kind of problem?