Imagine a file upload to Amazon S3 and on success the location of the file (and metadata) is stored within a record in ElasticSearch. Ok thats fine.
But, how do you ensure data consistency (ACID?) if one of the request fails ... For example, if the ElasticSearch service is unavailible
- the file on S3 must be deleted
- but what if the delete on S3 fails
this would lead to an inconsistent state.
So the question is how do you keep these instances in sync?
Ideas are:
- If there is an inconsistent state and the user requests the ElasticSearch record and nothing is found the record in S3 is deleted. (meeh)
- Batch Jobs to search the DB for inconsitencies and remove them.
- Run both requests in a transaction in database and if one fails -> rollback and retry later (queue, jobs = Overkill?)