3

I am planning to use redis as a cache for an already existing database(MS SQL).I would like to use the data from redis to put in the front end.I will be dealing with huge amount of data around 100GB in a day.I will mostly have table which contains a time value and some counter value(some 10-100 columns). How would redis perform if i am to do aggregation on these much data based on hour,day etc....(ie based on time column.) Is redis the right way to do it or are there are any alternative? I don't know how good nosql is when dealing with aggreagation compared to RDBMS. And how would MonogoDB do in such a scenario?

Thanks

  • 1
    Will you be dumping all data into Redis (probably a bad idea) or writing a wrapper that puts some data into Redis and expires it after sometime. Also, am waiting for answers to the question if Redis is the best way to do this. – automaticAllDramatic Apr 23 '13 at 12:18
  • i will be holding data for max time of 1 month. – Joe Dominic Valluvassery Apr 23 '13 at 13:05
  • 1
    keep in mind that you have to fit all data stored in redis into RAM – Tommaso Barbugli Apr 23 '13 at 14:43
  • yeah, thats why on production, very expensive – automaticAllDramatic Apr 24 '13 at 05:00
  • @JoeDominicValluvassery So you want to store about 100GB in redis, pull data from redis to aggregate in memory, and deliver the aggregated data to the consumer? How large will these aggregations get? – raffian May 08 '13 at 21:11
  • @SAFX:exactly.Aggreagtion won't be complex its just basically a group by hour,day,etc. – Joe Dominic Valluvassery May 09 '13 at 05:40
  • @JoeDominicValluvassery Do you expect your data set to grow beyond 100GB? – raffian May 09 '13 at 15:01
  • @SAFX : Just came across this other [post](http://stackoverflow.com/questions/10004565/redis-10x-more-memory-usage-than-data/10008222#10008222) about Redis taking up 10x more memory than the size of the data. So if the data-set is 100GB, redis might need much more than that ?! – 2020 May 17 '13 at 22:18
  • 1
    @brainOverflow Yes, but there are methods by which to conserve space in redis. At our firm, we store data mostly in string/value pairs and sorted sets; in both cases, we use MessagePack to serialize data before storing in redis. Based on our tests, 1Gb of data compresses down to 250Mb in redis with MessagePack. – raffian May 17 '13 at 23:11

1 Answers1

4

If you need to store 100Gb and you don't expect your data set to grow much beyond that, start with 3 redis instances, each with 64Gb of RAM, total of 192Gb, more than enough to hold your data set and with room to grow.

Each redis instance will be a master, so your data will be split amongst the instances equally. You'll need to shard across the instances from the application layer using a simple hashing algorithm, for instance...

(from your application layer)
shardKey = "redis" + getShardKey( cacheKey);
redisConnection = getRedisConnectionByShardKey( shardKey);
//do work with redisConnection here

The function getShardKey(string) takes the cacheKey, converts it to an integer, then mods it by the number of redis instances, returning either 0, 1, or 2. Configure a connection pool for each redis instance, give each one a name like redis0, redis1, etc., after you call the hash function, use the shard key to get a connection for the target redis instance. Once you have the data you need, do the aggregation in your application layer.

This is a simple approach; it distributes data equally amongst the redis instances (more or less), and avoids stuffing everything into a single redis instance. Redis is single-threaded, so if you're doing lots of I/O you'll be bound by how fast your cpu can service requests,; using multiple instances distributes that load.

This solution breaks down when your data set grows beyond 180Gb. If you add another redis instance to accommodate a larger data set, the hash function must be updated to reflect modulo 4, not 3, and you'll have to move most of your data around, this gets ugly, so use this approach only if you're 100% sure the data set will stay below 150Gb.

raffian
  • 31,267
  • 26
  • 103
  • 174
  • 2
    +1 for your input: `There are methods by which to conserve space in redis. At our firm, we store data mostly in string/value pairs and sorted sets; in both cases, we use MessagePack to serialize data before storing in redis. Based on our tests, 1Gb of data compresses down to 250Mb in redis with MessagePack.` – 2020 May 17 '13 at 23:21