4

I am using Wasabi as S3 storage for my project, and I've been thinking of utilizing S3 as key-value, storage. Wasabi does not charge on API requests as noted here https://wasabi.com/cloud-storage-pricing/

And anyone can easily (in any programming language, maybe) to implement such a simple interface to Amazon S3:

value = store.get(key)
store.put(key, value)
store.delete(key)

Where the key is a string and value is binary data. Effectively using it as a highly distributed and elastic Key-Value store.

So one can store a User object for example with

userid:1234567890:username -> johnsmith
userid:1234567890:username:johnsmith:password -> encrypted_password
userid:1234567890:username:johnsmith:profile_picture -> image_binary
userid:1234567890:username:johnsmith:fav_color -> red

Values are serialized into binary.

And so on.

I have a few questions, what's the best strategy to use Amazon S3 as a key-value store for those who have tried to use S3 as either a database or datastore. Although I think its fairly easy to retrieve the whole user object described here by querying keys with prefix userid:1234567890 and do the logic needed in code, the obvious downside with this is you can't search for value.

  1. What algorithm can be used here to implement a simple key search function, e.g. search for a user with a username starting with "j" or user with fav_color "red", looking at the very basic key-value interface get and put I think this is impossible, but maybe someone knows a work-around?
  2. What kind of serialization strategy for both primitive data types (String, Number, Boolean, etc) and Blob data (images, audio, video, and any sort of file) is best for this kind of key-value store? Also, this simple key-value does not have a way to define what type of value is stored in the key (is it a string, number, binary, etc?), how can that be solved?
  3. How can transactions can be achieved in this kind of scenario? Like in the example above, store the username johnsmith if and only if the other keys are also stored or not at all, I am thinking is S3 batch operation enough to solve this?
  4. What the are main design considerations when planning to use this as the main database for applications (and for production use), both in algorithmic perspective and also considering the limitations of S3 itself?
quarks
  • 33,478
  • 73
  • 290
  • 513
  • 3
    Couple of days ago i saw this post, it may not answer all the questions but may give you insight https://stackoverflow.com/questions/56108144/using-s3-as-a-database-vs-database-e-g-mongodb – Ersoy Jun 12 '20 at 02:52
  • 2
    Amazon S3 _is_ a key-value store. The Key is the name of the object, and the Value is the contents of the object. There is no need to serialize contents since S3 can store any blob it is given. However, there are no Search functions — you would need to maintain your own database to perform such operations. If you are just storing small snippets of information, then maintaining a database as an index would be more work than just storing the data in a database. However, it is excellent for storing large blobs of data. – John Rotenstein Jun 12 '20 at 04:31
  • @JohnRotenstein in Java the S3 put method is `PutObjectRequest(String bucketName, String key, InputStream input, ObjectMetadata metadata)` so I think it is required to serialize primitive types in order to store to S3 via Java SDK – quarks Jun 12 '20 at 07:01

0 Answers0