3

After reading Can someone explain redis setbit command?

and http://blog.getspool.com/2011/11/29/fast-easy-realtime-metrics-using-redis-bitmaps/ (referenced in the redis docs)

I'm still struggling to identify use cases for using SETBIT over SET. The above sources seems to be citing a driving factor for using SETBIT for storing events and 'countable' datasets in binary as it facilitates a dramatic reduction in the amount of data you need to store, while still retaining ease-of-access.

Is storing daily unique visits to a website by userID(identified by offset from 0) in bitmap 100000001--where users with ID 0 and 8 are the only ones with a visit--better than just setting timestamp : userID? Please explain. Thank you.

My apologies for this being so obviously a neophyte question.

Community
  • 1
  • 1
sjt003
  • 2,407
  • 5
  • 24
  • 39

2 Answers2

2

Bits are the basic data units that computers use, and Redis' BIT* command allow you easy manipulation of bit values. In the example that the OP had provided, the use of a bitstream will primarily result in savings in terms of space.

Keeping a key for each login will cost (at least) the size of the key and value, totaling about 10 bytes, whereas a bitstream will require just 1 bit for every user.

Itamar Haber
  • 47,336
  • 7
  • 91
  • 117
  • In order to fully answer the question, can you define any additional, concrete use cases where this level of storage and manipulation would be a definite advantage? – sjt003 May 18 '15 at 14:13
  • so is this basically restricting the saved data to that which can be represented in boolean values? – sjt003 May 21 '15 at 18:45
  • Isn't our age proof that neatly everything can be digitized? :) – Itamar Haber May 21 '15 at 19:15
  • I'm speaking of data types--specifically like storing strings vs ints vs boolean values vs json docs vs whatever -- this setbit system forces you to think about your schema and model your data storage in a very particular way, a way that may not be appropriate for every data set. My question was specific and I'm sort of looking for specifics, not grand statements like the above. I'm just trying to keep the conversation moving in the theme of SO. Thanks. – sjt003 May 21 '15 at 19:27
  • Sure, it's just my weird sense of humor :) Bits are certainly an obvious choice for Booleans, but you can just as easily store ints (I e. 8-, 16-bits...) or any other format you wish. – Itamar Haber May 21 '15 at 20:08
2

The answer is: it depends. In the above usecase it depends for example on how many logins you have per day (how many bits are active in the bitmask). If you have for example 2 logins or random user ids, it might be better to just store an LIST of logins.

But if you are having an active userbase and 60% of all users are active.. it turns out that having to store 1 bit (actually its less than that on average, because redis only stores the bitmask until the heighest set bit (1) is reached) is much more memory-friendly than storing IDs in a list. Storing IDs in a list will result in the use of e.g. 32 bits (integer) to represent a 1-bit information, which is wastefull. It might be even more if the list is using some tree concept with explicit pointers to related nodes. Due to the fact that we RAM is kinda expensive/limited and we want things to be scalable aswell, one should aim for minimal memory usage while still metting all query requirements.

So this is something I would decide from use case to use case.

However, using bitmasks allows for very fast bulk fitering of huge datasets. Let's say you store 2 bitmasks: 1 is loggedInToday, 1 is signedUpForNewsletter. By using an bitoperation like AND (processors can do those operations really fast), you can suddenly filter out all user ids (represented by the bitposition's of the 1's) that have both logged in today and signed up for the newsletters. Because intersections of a bitmasks can be done by atleast one magnitude faster than those of two ordered lists of id's, you can suddenly do this operation on millions of users and still stay below 50ms.

To wrap up my answer: the usage of bitmasks allows for some realtime analytics that would otherwise not-be-realtime and can save you a lot of memory IF you are expecting many items in a list. Note that this is just one usage, there are many others (like bloom filters).

Manuel Arwed Schmidt
  • 3,376
  • 2
  • 18
  • 28