3

What is the difference in space between sorted sets and lists in redis? My guess is that sorted sets are some kind of balanced binary tree, and lists are a linked list. This means that on top of the three values that I'm encoding for each of them, key, score, value, although I'll munge together score and value for the linkedlist, the overhead is that the linkedlist needs to keep track of one other node, and the binary tree needs to keep track of two, so that the space overhead to using a sorted set is O(N).

If my value, and score are both longs, and the pointers to the other nodes are also longs, it seems like the space overhead of a single node goes from 3 longs to 4 longs on a 64-bit computer, which is a 33% increase in space.

Is this true?

nnythm
  • 3,280
  • 4
  • 26
  • 36
  • There is a lot of info about this in [redis docs on memory optimization](http://redis.io/topics/memory-optimization), warmly recommended reading. I'm not sure whether you're suggesting using a list as a makeshift sorted set? Surely not. – Linus Thiel Sep 04 '12 at 19:15
  • @LinusGThiel I read the redis docs on memory optimization, and it does not mention sorted sets nor lists, except to say that you can use ziplists if they're small. I am almost certainly going to use a list as a makeshift sorted set, because my "score" is a timestamp, so I can just push and maintain the sort order. – nnythm Sep 04 '12 at 19:38
  • All right, just be aware of the time complexity when retrieving from different parts of lists/sorted sets. If you don't get great answers here on SO, I recommend you direct this question to the mailing list, who are usually very knowledgeable and accomodating. – Linus Thiel Sep 04 '12 at 21:25
  • The docs have all of the time complexities, just none of the space information. – nnythm Sep 04 '12 at 21:46

1 Answers1

5

It is much more than your estimation. Let's suppose ziplists are not used (i.e. you have a significant number of items).

A Redis list is a classical double-linked list: 3 pointers (prev,next,value) per item.

A sorted set is a dictionary plus a skip list. In the dictionary, items will be stored with 3 pointers as well (key,value,next). The skip list memory footprint is more complex to evaluate: each node takes 1 double (score), 2 pointers (obj,backward), plus n couples (pointer,span value) with n between 1 and 32. Most items will take only 1 or 2 couples.

In other words, when it is not represented as a ziplist, a sorted set is by far the Redis data structure with the most overhead. Compared to a list, the memory overhead is more than 200% (i.e. 3 times).

Note: the best way to evaluate memory consumption with Redis is to try to build a big list or sorted set with pseudo-data and use INFO to get the memory footprint.

Didier Spezia
  • 70,911
  • 12
  • 189
  • 154
  • Hey, I must be misinterpreting the info command's output. I did redis-benchmark -q -n 1000000 zadd sortedset rand:00000000000 ele:rand:000000000000, and then called the command INFO in the redis-cli, and got a used_memory_peak_human of 1.30M. However, I am doing one million sorted set adds, so there clearly must be more than a megabyte of data. What am I missing? – nnythm Sep 05 '12 at 03:33
  • Several problems: the score should be a numeric (it is not because of the rand prefix), a sorted set guarantees unicity (so you need to randomize the keys - see the -r option of redis-benchmark) – Didier Spezia Sep 05 '12 at 08:09
  • With ./redis-benchmark -q -r 10000000000 -n 1000000 zadd sortedset 0 ele:rand:000000000000, I get 135 MB for 1M items. – Didier Spezia Sep 05 '12 at 08:15