If you need to store 100Gb and you don't expect your data set to grow much beyond that, start with 3 redis instances, each with 64Gb of RAM, total of 192Gb, more than enough to hold your data set and with room to grow.
Each redis instance will be a master, so your data will be split amongst the instances equally. You'll need to shard across the instances from the application layer using a simple hashing algorithm, for instance...
(from your application layer)
shardKey = "redis" + getShardKey( cacheKey);
redisConnection = getRedisConnectionByShardKey( shardKey);
//do work with redisConnection here
The function getShardKey(string)
takes the cacheKey, converts it to an integer, then mods it by the number of redis instances, returning either 0, 1, or 2. Configure a connection pool for each redis instance, give each one a name like redis0
, redis1
, etc., after you call the hash function, use the shard key to get a connection for the target redis instance. Once you have the data you need, do the aggregation in your application layer.
This is a simple approach; it distributes data equally amongst the redis instances (more or less), and avoids stuffing everything into a single redis instance. Redis is single-threaded, so if you're doing lots of I/O you'll be bound by how fast your cpu can service requests,; using multiple instances distributes that load.
This solution breaks down when your data set grows beyond 180Gb. If you add another redis instance to accommodate a larger data set, the hash function must be updated to reflect modulo 4, not 3, and you'll have to move most of your data around, this gets ugly, so use this approach only if you're 100% sure the data set will stay below 150Gb.