How to keep data consistent? (Amazon S3 and ElasticSearch)

Question

Imagine a file upload to Amazon S3 and on success the location of the file (and metadata) is stored within a record in ElasticSearch. Ok thats fine.

But, how do you ensure data consistency (ACID?) if one of the request fails ... For example, if the ElasticSearch service is unavailible

the file on S3 must be deleted
but what if the delete on S3 fails

this would lead to an inconsistent state.

So the question is how do you keep these instances in sync?

Ideas are:

If there is an inconsistent state and the user requests the ElasticSearch record and nothing is found the record in S3 is deleted. (meeh)
Batch Jobs to search the DB for inconsitencies and remove them.
Run both requests in a transaction in database and if one fails -> rollback and retry later (queue, jobs = Overkill?)

score 0 · Answer 1 · answered Aug 19 '16 at 23:40

ACID is impossible in this case, as you are working with eventually consistent systems.

Your third suggestion is the closest to best-practices.

The reference architecture for this system is to write the object into AWS S3, then use S3 Bucket notifications into AWS Lambda to perform the write to Elasticsearch. In the case of a failed write during notification phase from S3->Lambda->Elasticsearch, use a dead-letter SQS queue to collect broken jobs, then flush the queue periodically using CloudWatch Events->Lambda on a timed schedule.

How to keep data consistent? (Amazon S3 and ElasticSearch)

1 Answers1