1

I am facing a problem with Spring Data and Elasticsearch. I need to do a query and partial add or update to an entity and in the end figure out if its completed.

The model is a sample as below

public class Entity{
  @Id private String id;
  private String country;
  private String city
  private boolean completed;
  ....
}
  • 4 fields like country and city arrive asynchronously in my application. (beware of random order, race conditions, etc)
  • I need to fill the completed field and send a notification when all fields are set incomplete.

In SQL the naive implementation of query and modify is doable, query based on id, set the fields, check if all are filled, set the completed save the entity in one transaction @Transactional.

How can I approach this with Elasticsearch?

I tried setting a repository like below for CRUD operations

public interface EntityRepository
  extends ElasticsearchRepository<Entity, String> {
}

but this will not work, between query and save and update, the data changes and the fields are incomplete.

From search other answers I saw that maybe I could do a partial update using ElasticsearchTemplate which seems a a better approach, eg update only city field when it arrives

Partial update with Spring Data Elasticsearch repository

But how will I be able to set the completed field ? Query later? It will not work I believe.

Lets say I receive city, update it, but before query, the country arrives and is set, so after both updates, the queries will both bring a completed entity and send two notifications, or.. something like that

Sergey Tsypanov
  • 3,265
  • 3
  • 8
  • 34
thahgr
  • 718
  • 1
  • 10
  • 27

1 Answers1

1

In Elasticsearch single-document operations are atomic, so considering the case you've described

Lets say I receive city, update it, but before query, the country arrives and is set, so after both updates, the queries will both bring a completed entity and send two notifications, or.. something like that

you shouldn't worry about data races, because even if you asynchronously execute two updates as:

POST /myindex/_update/id
{
    "enity" : {
       "city": "smth"
    }
}

and

POST /myindex/_update/id
{
    "enity" : {
       "country": "smth"
    }
}

they won't be racy.

After each update ES index can be only in two states:

  • one field is updated
  • both fields are updated

So the query from index will return you the entity of consistent state.

Now having this in mind after both updates just add a query returning the entity by id and do the check for setting flag and notification

Entity e = entityRepository.findById(id);
if (!e.isCompleted() && e.getCity() != null && e.getCountry() != null) {
  updateCompletedFlag(e);
  notifyAboutCompletion(e);
}
Sergey Tsypanov
  • 3,265
  • 3
  • 8
  • 34
  • indeed this is true that the es will either have both or one updated. However the problem is that the two updates happen before the reads so the reads both see them as completed and both send notifcations, it depends on race condition. Addittionally read and update happen as different transactions – thahgr Jul 13 '23 at 09:18
  • @thahgr there are no transactions in ES, and providing you check that both flags are not null there'll be only one notification. Try it out – Sergey Tsypanov Jul 13 '23 at 09:26