I'm trying to implement a Tagging using Redis. This is how it looks like:
mykey (my item)
mykey:tags (a set with the tags associated to that item)
tags:tag1 (a set with references to all items tagged with "tag1")
...
I'm planning on using Redis Keyspace Notifications to prevent expired keys to stay on my tag sets forever (even when every item in the cache has a default TTL set, I don't like to keep stale data around).
These are the options I'm considering:
1) Subscribe to all "expired" events.
psubscribe '__keyevent@*:expired'
Pros:
- Only 1 subscriber.
Cons:
- Since not all items contain tags, I will have to check for mykey:tags and if exists get the tags and remove the item from each tag set.
- The contention on this method will increase with the amount of keys in the store.
2) Subscribe to all events for those keys containing tags only.
psubscribe '__keyspace@*:mykey'
Pros:
- Subscriptions will be created for those items with tags only.
Cons:
- There must be overhead associated with each subscriber.
- The number of subscriber can grow pretty fast depending on the number of tagged items in the store.
Questions:
- Which option should I implement? Should I be concerned about the number of subscribers on 2) or is the contention on 1) a bigger deal? I couldn't find any recommendations about this subject.
- The end game is to implement this on Redis Cluster. Does this add any extra concern to the implementation?
Update 1:
This is a generic implementation for tagging on top of our cache. I'm not sure at this point about how we ended up using it. This is more like a PoC I'm working on. Some numbers trying to answer some questions in the comments:
- Volume: We have tens of millions of unique visitors per day. Not all items stored in cache for each visitor has tags though. But this changes constantly.
- Tags: Tags are managed. There are currently a couple of dozen of tags. We are considering supporting free text tags in the future.
- I haven't tested any of the two approaches I'm suggesting here. I was hoping that one of the options were so bad that was not even an option :)
Update 2:
After some trials and errors and some more research I discarded 2). There is a limit for redis clients as well as for the Output Buffers which makes this option a no go. You can find more information here and here. I tried 1) and it works just fine. I even set the expiration of the keys 5ms apart from each other and the code handle it properly. This can be an alternative to go.
Another option can be the one suggested by @thepirat000. I'm marking this answer as the accepted one but I'm also adding a little tweak to his suggestion: I don't want to do maintenance in the tags on every tag operation, instead I can randomly determine when to do it. This is a good enough approach which doesn't use pub/sub nor the keyspace notifications.