-1

I have to implement a system where a tenant can store multiple key-value stores. one key-value store can have a million records, and there will be multiple columns in one store

[Edited] I have to store tabular data (list with multiple columns) like Excel where column headers will be unique and have no defined schema. This will be a kind of static data (eventually updated). We will provide a UI to handle those updates. Every tenant would like to store multiple table structured data which they have to refer it in different applications and the contract will be JSON only.

For Example, an Organization/Tenant wants to store their Employees List/ Country-State List, and there are some custom lists that are customized for the product and this data is in millions.

A simple solution is to use SQL but here schema is not defined, this is a user-defined schema, and though I have handled this in SQL, there are some performance issues, so I want to choose a NoSQL DB that suits better for this requirement.

Design Constraints:

  1. Get API latency should be minimum.
  2. We can simply assume the Pareto rule, 80:20 80% read calls and 20% write so it is a read-heavy application
  3. Users can update one of the records/one columns
  4. Users can do queries based on some column value, we need to implement indexes on multiple columns.
  5. It's schema-less so we can simply assume it is NoSql, SQL also supports JSON but it is very hard to update a single row, and we can not define indexes on dynamic columns.
  6. I want to segregate key-values stores per tenant, no list will be shared between tenants.

One Key Value Store :

enter image description here

Another key value store example: https://datahub.io/core/country-list

I am thinking of Cassandra or any wide-column database, we can also think of a document database (Mongo DB), every collection can be a key-value store or Amazon Dynamo database

Cassandra: allows you to partition data by partition key and in my use case I may want to get data by different columns in Cassandra we have to query all partitions which will be expensive.

David Makogon
  • 69,407
  • 21
  • 141
  • 189
sobby01
  • 1,916
  • 1
  • 13
  • 22
  • It's not clear to me what your question is or what you're trying to achieve. Are you at least able to update your original question with a sample application query in plain English? Cheers! – Erick Ramirez Oct 27 '22 at 22:12
  • @ErickRamirez, sure let me add some more details – sobby01 Oct 28 '22 at 13:10
  • @ErickRamirez : I have edited the question, please see it make sense now. – sobby01 Oct 28 '22 at 17:56
  • fyi - tool/product/service recommendation questions are expressly off-topic here. Plus, as written, this really is opinion-based. There's no "right" answer to this. Oh, and this has nothing to do with any of the databases you tagged (I removed all those tags). Please don't do this with tags; those tags should only be used for questions specific to one of the databases. – David Makogon Oct 29 '22 at 22:14

2 Answers2

0

Your example data shows duplicate items, which is not something NoSQL datbases can store.

DynamoDB can handle this scenario quite efficiently, its well suited for high read activity and delivers consistent single digit ms low latency at any scale. One caveat of DynamoDB compared to the others you mention is the 400KB item size limit.

Leeroy Hannigan
  • 11,409
  • 3
  • 14
  • 31
0

In order to get top performance from DynamoDB, you have to utilize the Partition key as much as possible, because it provides you with hash-based access (super fast).

Its obvious that unique identifier for the user should be present (username?) in the PK, but if there is another field that you always have during request time, like the country for example, you should include it in the PK.

Like so

PK                                    SK
Username#S2#Country#US#State#Georgia  Address#A1

It might be worth storing a mapping for the countries alone so you can retrieve them before executing the heavy query. Global Indexes can't be more than 20, keep that in mind and reuse/overload indexes and keys as much as possible.

Stick to single table design to utilize this better.

As mentioned by Lee Hannigan, duplicated elements are not supported, all keys (including those of the indexes) must be unique pairs

Borislav Stoilov
  • 3,247
  • 2
  • 21
  • 46