3

I'm used to MySQL and now trying to grasp an understanding of how to use key value stores. What I have not seen are good noob like examples of database design and how you would insert and get information.

Is this a correct representation of how you would store data from MySQL in a key value store?

TYPE: MySQL
TABLE: users
COLUMNS: user_id(primary), username, location

TYPE: Key Value Store
TABLE: users
KEY: user_id
VALUES: username, location

So, if I am correct above. Pulling general user information is simple enough to understand. But how would I preform the following query in a key value store?

SELECT username FROM users WHERE location = 'mexico'

The way I thought you could do this easily is to create another table. (assume there are over 5,000 users, im sure there are other ways to do this if you only had a couple hundred)

--Original Table--
TYPE: Key Value Store
TABLE: users
KEY: user_id
VALUES: username, location

--Additional "query" Table--
TYPE: Key Value Store
TABLE: user-location
KEY: location
VALUES: user_id

However, now we need to adjust two tables when someone new joins, updates their location, etc. It's not a huge deal I suppose, you just have to be super accurate with your applications code.

Is this the best way to solve these problems? Or am I missing something?

Adam
  • 723
  • 10
  • 22
  • 1
    Generally, NoSQL data-stores offer less features. They are not exactly known for developer productivity. – usr Mar 17 '12 at 21:34

3 Answers3

2

Updated Answer (Jan-2014)

DynamoDB started supporting Global Secondary Indices which means you can now put an index on the location and fast retrieve only those live in mexico.

Note that at the time of writing (this might change) you can't add indices to existing tables.

Original Answer (Mar-2013)

Notes on NoSQL in general:
NoSQL DBMS usually focus on scalability.
They also usually add an application overhead in terms of more server side code.

You should ask yourself "how many times would I need to query the users from mexico"
The answer will likely direct you in the right approach when modeling your database.
That is also the reason there are no "perfect-fits" and no really "noob samples" (at least to my knowledge)

Now looking at DynamoDB in particular, you don't have the luxury of secondary indexes (as opposed to other NoSQL solutions that do have), so you need to create tables-as-indexes. In your model, you can create a table where the hash key is the location and the range key is the user id. Thus with a QUERY API call you can get all the MEXICO users.

You can think of other implementations as well such as keeping the ids concatenated in a single object, but again, since DynamoDB only allows 64KB objects - you will probably run into a scaling issue here.

Chen Harel
  • 9,684
  • 5
  • 44
  • 58
1

Don't manage separate index tables yourself.

Instead use the new global secondary index feature.

Ken Lin
  • 1,819
  • 21
  • 21
0

If your design is such that you end up doing lot of lookups based on location then you should re-design the user table with Location as hashkey and userId as range key. But above way removes the capability to query the users on their name or userID, also while inserting new user can't check for uniqueness in userID (contradictory to what primary key in MySql was doing).

Now if you don't search based on location quite often, then performing a scan operation might be a better solution.

The best approach would be as you mentioned to do all these processing on the API level based on your need.

Ashwin Patti
  • 588
  • 1
  • 3
  • 8