0

I have a use-case where the backend store is S3 and we want to power the search through elastic search. One option is to update S3 and index simultaneously.

Most of the use-cases I have seen are updating the index asynchronously. One obvious downside of synchronous updates is to handle the failure cases when update to S3 succeed but index update fails.

What are the points against having synchronous updates if latency is not an issue?

Abhay Dubey
  • 549
  • 2
  • 7
  • 18

1 Answers1

0

If you first index then store, and storing fails, then you need to remove the indexed document (otherwise, someone will be able to find it in a search, and might get the wrong impression it exists, when it doesn't). If storage failure are relatively rare, then it probably pays out, but you need to find it out.

On the other hand, if the objects you store and index, are processed in parallel, then you actually get the same effect: while one object is being stored, another is being indexed, while still making sure that an object won't be searchable unless it's stored. That way, you won't need to rollback any operations you did on your index.

Haris Osmanagić
  • 1,249
  • 12
  • 28
  • As you said, indexing before storing doesn't make sense. The only point is should I fail the write if writing it to index fails? The problem with writing in parallel is that we need to handle both the failure cases (Writing to S3 fails or writing to index fails). – Abhay Dubey Jul 27 '17 at 14:40
  • Depends completely on your use case. Is it a must for the objects to be searchable; Is there any other way to access them (e.g. their S3 key is stored somewhere else); Will you retry indexing; etc. – Haris Osmanagić Jul 28 '17 at 07:59
  • - It is a must for the objects to be searchable. - The objects can be read using its S3 key in few use-cases when we have the primary id and need not search using other attributes. - As write latency is not a major concern, we will retry indexing few times if it fails. If the index write still fails, we can simply fail the call. Please let me know if you have ever used something like this or see any major issues with this approach. – Abhay Dubey Jul 29 '17 at 09:13
  • The flow you described is pretty generic, and the details solely depends on your use case. There are many more factors which need to taken into the account, such as: the availability of objects (can they disappear quickly?), do you need (near-) real-time search and so on. – Haris Osmanagić Aug 08 '17 at 13:35