I am using Wasabi as S3 storage for my project, and I've been thinking of utilizing S3 as key-value, storage. Wasabi does not charge on API requests as noted here https://wasabi.com/cloud-storage-pricing/
And anyone can easily (in any programming language, maybe) to implement such a simple interface to Amazon S3:
value = store.get(key)
store.put(key, value)
store.delete(key)
Where the key
is a string and value
is binary data. Effectively using it as a highly distributed and elastic Key-Value store.
So one can store a User object for example with
userid:1234567890:username -> johnsmith
userid:1234567890:username:johnsmith:password -> encrypted_password
userid:1234567890:username:johnsmith:profile_picture -> image_binary
userid:1234567890:username:johnsmith:fav_color -> red
Values are serialized into binary.
And so on.
I have a few questions, what's the best strategy to use Amazon S3 as a key-value store for those who have tried to use S3 as either a database or datastore. Although I think its fairly easy to retrieve the whole user object described here by querying keys with prefix userid:1234567890
and do the logic needed in code, the obvious downside with this is you can't search for value.
- What algorithm can be used here to implement a simple key search function, e.g. search for a user with a username starting with "j" or user with fav_color "red", looking at the very basic key-value interface get and put I think this is impossible, but maybe someone knows a work-around?
- What kind of serialization strategy for both primitive data types (String, Number, Boolean, etc) and Blob data (images, audio, video, and any sort of file) is best for this kind of key-value store? Also, this simple key-value does not have a way to define what type of value is stored in the key (is it a string, number, binary, etc?), how can that be solved?
- How can transactions can be achieved in this kind of scenario? Like in the example above, store the
username
johnsmith
if and only if the other keys are also stored or not at all, I am thinking is S3 batch operation enough to solve this? - What the are main design considerations when planning to use this as the main database for applications (and for production use), both in algorithmic perspective and also considering the limitations of S3 itself?