1

I am writing a node.js application that relies on redis as its main database, and user info is stored in this database.

I currently have the user data (email, password, date created, etc.) in a hash with the name as user:(incremental uid). And a key email:(email) with value (same incremental uid).

When someone logs in, the app looks up a key matching the email with email:(email) to return the (incremental uid) to access the user data with user:(incremental uid).

This works great, however, if the number of users reaches into the millions (possible, but somewhat a distant issue), my database size will increase dramatically and I'll start running into some problems.

I'm wondering how to hash an email down to an integer that I can use to sort into hash buckets like this (pseudocode):

hash(thisguy@somedomain.com) returns 1234  
1234 % 3 or something returns 1
store { thisguy@somedomain.com : (his incremental uid) } in hash emailbucket:1

Then when I need to lookup this uid for email thisguy@somedomain.com, I use a similar procedure:

hash(thisguy@somedomain.com) returns 1234  
1234 % 3 or something returns 1
lookup thisguy@somedomain.com in hash emailbucket:1 returns his (incremental uid)

So, my questions in list form:

  1. Is this practical / is there a better way?
  2. How can I hash the email to a few digits?
  3. What is the best way to organize these hashes into buckets?
evan.bovie
  • 270
  • 2
  • 13

2 Answers2

0
  1. It probably won't end up mattering that much. Redis doesn't have an integer type, so you're only saving yourself a few bytes (and less each time your counter rolls over to the next digit). Doing some napkin math, at a million users, the difference in actual storage would be ~50 mbs. With hard drives in the < $1 / gb range, it's not worth the time it would take to implement.
  2. As a thought experiment, you could maintain a key that is your current user counter, and just GET and INCR each time you add a new user.
Tim Brown
  • 3,173
  • 1
  • 18
  • 15
-1

Yes it the better way for saving millions of key value pair in hashes. You need to create the algorithm for yourself. For example - you can use timestamp for creating a bucket value which changes after every 1000 value. . There can be many other ways.

Read this article for more reference http://instagram-engineering.tumblr.com/post/12202313862/storing-hundreds-of-millions-of-simple-key-value

NeiL
  • 791
  • 8
  • 35
  • Whilst this may theoretically answer the question, [it would be preferable](//meta.stackoverflow.com/q/8259) to include the essential parts of the answer here, and provide the link for reference. – Tunaki Feb 12 '16 at 11:36