3

I'm trying to implement a Tagging using Redis. This is how it looks like:

mykey (my item)
mykey:tags (a set with the tags associated to that item)
tags:tag1 (a set with references to all items tagged with "tag1")
...

I'm planning on using Redis Keyspace Notifications to prevent expired keys to stay on my tag sets forever (even when every item in the cache has a default TTL set, I don't like to keep stale data around).

These are the options I'm considering:

1) Subscribe to all "expired" events.

psubscribe '__keyevent@*:expired'

Pros:

  • Only 1 subscriber.

Cons:

  • Since not all items contain tags, I will have to check for mykey:tags and if exists get the tags and remove the item from each tag set.
  • The contention on this method will increase with the amount of keys in the store.

2) Subscribe to all events for those keys containing tags only.

psubscribe '__keyspace@*:mykey'

Pros:

  • Subscriptions will be created for those items with tags only.

Cons:

  • There must be overhead associated with each subscriber.
  • The number of subscriber can grow pretty fast depending on the number of tagged items in the store.

Questions:

  1. Which option should I implement? Should I be concerned about the number of subscribers on 2) or is the contention on 1) a bigger deal? I couldn't find any recommendations about this subject.
  2. The end game is to implement this on Redis Cluster. Does this add any extra concern to the implementation?

Update 1:

This is a generic implementation for tagging on top of our cache. I'm not sure at this point about how we ended up using it. This is more like a PoC I'm working on. Some numbers trying to answer some questions in the comments:

  • Volume: We have tens of millions of unique visitors per day. Not all items stored in cache for each visitor has tags though. But this changes constantly.
  • Tags: Tags are managed. There are currently a couple of dozen of tags. We are considering supporting free text tags in the future.
  • I haven't tested any of the two approaches I'm suggesting here. I was hoping that one of the options were so bad that was not even an option :)

Update 2:

After some trials and errors and some more research I discarded 2). There is a limit for redis clients as well as for the Output Buffers which makes this option a no go. You can find more information here and here. I tried 1) and it works just fine. I even set the expiration of the keys 5ms apart from each other and the code handle it properly. This can be an alternative to go.

Another option can be the one suggested by @thepirat000. I'm marking this answer as the accepted one but I'm also adding a little tweak to his suggestion: I don't want to do maintenance in the tags on every tag operation, instead I can randomly determine when to do it. This is a good enough approach which doesn't use pub/sub nor the keyspace notifications.

Raciel R.
  • 2,136
  • 20
  • 27
  • @RyanVincent you are right. All I have so far is speculation. Was hoping there was a clear path between the 2 options I'm considering. – Raciel R. Mar 14 '16 at 18:07

1 Answers1

1

There will be probably too much overhead by using Keyspace Notifications for this.

Why don't you do the clean-up as a scheduled or recurring task, or even when the keys are retrieved by tag?

I've worked on something similar on CachingFramework.Redis where the cleanup is optionally run when retrieving the keys related to a tag. Also the tag set TTL is the MAX(TTL) of the keys it contains.

thepirat000
  • 12,362
  • 4
  • 46
  • 72
  • Using a recurring maintenance task was an option I actually found [here](http://stackify.com/implementing-cache-tagging-redis/). What I don't like about that is the fact that you have do use KEYS or SCAN to loop through the tag sets and remove expired keys. I implemented TTL for the tag set exactly as you mentioned; but there is always a chance that an old key expires and new keys keep the set alive forever.. Eventually you end up with a ton of expired keys in the mix. – Raciel R. Mar 14 '16 at 19:08
  • I think I jumped to fast to a conclusion w/o actually checking your code. By passing the actual tags you don't have to SCAN or KEYS on your tags. That's probably an acceptable solution that I can use. I don't want to do this on EVERY operation though. Thanks for the contribution, will keep this open for now. – Raciel R. Mar 14 '16 at 19:13