5

I want to store user's profiles in redis, as I have to frequently read multiple user's profiles.. there are two options I see at present:

Option 1: - store separate hash key per user's profile

  • [hash] - u1 profile {id: u1, name:user1, email:user1@domain.com, photo:url}
  • [hash] - u2 profile {id: u2, name:user2, email:user2@domain.com, photo:url}
  • where for every user's id is hash key and profile field and values JSON-serialized profile objects. (OR instead of json user field-value pairs)

Option 2: - use single hash key to store all users profile

  • [hash] - users-profile u1 {id: u1, name:user1, email:user1@domain.com, photo:url}
  • [hash] - users-profile u2 {id:u2, name:user2, email:user2@domain.com, photo:url}
  • where in users-profile hash key, user's ids field and values JSON-serialized profile objects.

Please tell me which option is best considering following:

  1. performance
  2. memory utilization
  3. read multiple user's profile - for batch processing I should able to read 1-100, 101-200 user's profile at time
  4. larger dataset - what if there are millions users profile
Suyog Kale
  • 373
  • 6
  • 17
  • 3
    One hash for all users is not a good solution by any criteria. – Sergio Tulentsev Jan 17 '17 at 12:56
  • @SergioTulentsev, thanks for your response. do you have any suggestion how to read multiple user profiles keys by range/paging? I am using nodejs as client application. – Suyog Kale Jan 18 '17 at 08:54
  • SCAN + HMGET and send them in a pipelined manner. – Sergio Tulentsev Jan 18 '17 at 10:23
  • You need to add more details: 1) how many information do you want to store in a profile 2) what type of batch loading do you want? do you know keys beforehand? or do you want to query 'all users'? – Imaskar Aug 31 '18 at 05:31

3 Answers3

3

As Sergio Tulentsev pointed out, its not good to store all the user's data (especially if the dataset is huge) inside one single hash by any means.

Storing the users data as individual keys is also not preferred if your looking for memory optimization as pointed out in this blog post

Reading the user's data using pagination mechanism demands one to use a database rather than a simple caching system like redis. Hence it's recommended to use a NoSQL database such as mongoDB for this.

But reading from the database each time is a costly operation especially if you're reading a lot of records.

Hence the best solution would be to cache the most active user's data in redis to eliminate the database fetch overhead.

I recommend you looking into walrus .

It basically follows the following pattern:

@cache.cached(timeout=expiry_in_secs)
def function_name(param1, param2, ...., param_n):
    # perform database fetch
    # return user data

This ensures that the frequently accessed or requested user data is in redis and the function automatically returns the value from redis than making the database call. Also the key is expired if not accessed for a long time.

You set it up as follows:

from walrus import *
db = Database(host='localhost', port=6379, db=0)

where host can take the domain name of the redis cluster running remotely.

Hope this helps.

daemon24
  • 1,388
  • 1
  • 11
  • 23
Adarsh
  • 3,273
  • 3
  • 20
  • 44
1

Option #1.

  • Performance: Typically it depends on your use case but let say that you want to read a specific user (on the login/logout, authorization purposes, etc). With option #1, you simply compute the user hash and get the user profile. With option #2, you will need to get all users profiles and parse the json (although you can make it efficient it would never be so efficient and simpler as option #1);

  • Memory utilization: You can make option #1 and option #2 take the same size in redis (on option #1, you can avoid storing the hash/user id as part of the json). However, and picking the same example to load a specific user, you just need to in code/memory a single user profile json instead of a bigger json with a set of user profiles

  • read multiple user's profile - for batch processing I should able to read 1-100, 101-200 user's profile at time: For this, as typically is done with a relational database, you want to do paging. There are different ways of doing paging with redis but using a scan operation is an easy way to iterate over a set of users

  • larger dataset - what if there are millions users profile:

Redis is an in-memory but persistent on disk database, so it represents a different trade off where very high write and read speed is achieved with the limitation of data sets that can't be larger than memory

If you "can't have a dataset larger the memory", you can look to Partitioning as the Redis FAQ suggests. On the Redis FAQ you can also check other metrics such as the "maximum number of keys a single Redis instance can hold" or "Redis memory footprint"

r.pedrosa
  • 709
  • 5
  • 12
0

PROS for option 1

(But don't use hash, use single key. Like SET profile:4d094f58c96767d7a0099d49 {...})

  • Iterating keys is slightly faster than iterating hash. (That's also why you should modify option 1 to use SET, not HSET)
  • Retrieving key value is slightly faster than retrieving hash field

PROS for option 2

  • You can get all users in a single call with HMGET, but only if your user base is not very big. Otherwise it can be too hard for server to serve you the result.
  • You can flush all users in a single command. Useful if you have backing DB.

PROS for option 3

Option 3 is to break your user data in hash buckets determined by hash from user id. Works good if you have many users and do batches often. Like this:

HSET profiles:<bucket> <id> {json object}
HGET profiles:<bucket> <id>
HMGET profiles:<bucket> 

The last one to get a whole bucket of profiles. Don't recommend it to be more than 1mb in total. Works good with sequential ids, not so good with hashes, because they can grow too much. If you used it with hashes and it grew too much that this slows your Redis, you can fallback to HSCAN (like in option2) or redistribute objects to more buckets with new hash function.

  • Faster batch load
  • Slightly slower single object store/load

My recommendation, if I got your situation right, is to use 3rd option with sequential ids of range 100. And if you aiming at hight amounts of data, plan for cluster from day one.

Imaskar
  • 2,773
  • 24
  • 35