What are best practices for caching for a "Home Feed" or a "News Feed" in Django

Question

I use Django as an API for my mobile front end. I just send JSON back and forth. I've created an endpoint for a home feed. Each user has a unique home feed depending on the people that they follow. Users post a photo, and that photo is pushed out to all of their followers' home feeds. Pretty simple and straight forward thus far.

A couple of my colleagues suggested that I should implement some sort of caching layer, but the problem is, this isn't just an regular site that is static. Each view is dynamic based on the user accessing it.

So for example, the home feed is a list of photos posted on the platform in DESC order of time (recent to old).

The home feed view is very basic. Each user has a 'homefeed:user_id:%s' list in Redis which contains the primary keys of the photo objects. I make a call over Redis and grab the request.user's homefeed list and then query the database with that list for those objects like so:

homefeed_pk_list = redis_server.lrange('homefeed:user_id:%s' % request.user.pk, 0, 100)

# Home feed queryset
queryset = Photo.objects.filter(pk__in = homefeed_pk_list)
response_data= []
for photo in queryset:
       # Code to return back JSON data
return HttpResponse(json.dumps(response_data), content_type="application/json")

Pretty simple. Now my question is, what should the best practice be for caching in this case? I could cache each serialized photo object individually and set an expiry of 24hrs since some photo objects are in multiple feeds (users . If the object doesn't exist in the cache, I'll hit the DB. What do you think of this approach?

Did my answer end up helping? – orokusaki Oct 07 '14 at 16:40 — orokusaki, Oct 07 '14 at 16:40

score 1 · Accepted Answer · answered Oct 07 '14 at 01:15

For maximum performance, you can implement something akin to Russian Doll Caching, the summary of which is something like: Cache objects, cache lists of those objects, cache generated pages that contain that list (i.e., don't just cache the finished result, cache all the way down).

However, given your example, I might start with:

import hashlib

from django.core.cache import cache
from django.http import HttpResponse

from whereever import redis_server


def feed(request):
    """
    Returns a JSON response containing Photo data.
    """
    # Get the list of PKs from Redis
    photo_pks = redis_server.lrange(
        'homefeed:user_id:%d' % request.user.pk,
        0,
        100
    )

    # Make a SHA1 hash of the PKs (cache key)
    cach_key = hashlib.sha1(unicode(photo_pks)).hexdigest()

    # Get the existing cache
    content = cache.get(cach_key)

    if content is None:
        # Make a queryset of Photos using the PK list
        queryset = Photo.objects.filter(pk__in=photo_pks)

        # Use .values() to get a list of dicts (the response data)
        content = json.dumps(
            queryset.values('pk', 'url', 'spam', 'eggs')
        )

        # Cache the response string for 24 hours
        cache.set(cach_key, content, 60 * 60 * 24)

    return HttpResponse(content, content_type='application/json')

The result will be that the response content will be cached for 24 hours, or until the PK list in Redis (presumably set elsewhere and updated when a new photo is added, etc.) changes, since the cache key is made using a hash of the PK list.

thank you for the quick response @orokusaki. Is the point of the SHA1 hash to generate unique random key names? Also in your opinion, what factors do you consider for how long a key should live in the cache? — deadlock, Oct 07 '14 at 17:14
@noa the SHA1 is just a way to make a unique-ish string of a fixed length from a longer string of unpredictable length (I turn the list of PKs into a string). The SHA1 of the PKs will always be the same until the list changes. This allows the cache to be used until the list changes (resulting in a different SHA1). Typically, the deeper the cache, the longer the cache timeout (e.g., you might cache your homepage for just 30 minutes, knowing that out of thousands of requests, only 1 every 30 minutes will result in queries, etc. — orokusaki, Oct 07 '14 at 17:32
@noa (continued) a lower level cache (caching an individual photo's properties, etc.) might benefit from a 24h timeout, if it only gets requested once every half hour or so, ensuring that it has a useful lifespan (1hr would result in the cache being read just a few times before being deleted). If you have access to the point at which an object is changed or deleted, you can cache it for as long as you want, knowing that you will be able to delete it from cache as soon as it's changed. However, I don't cache longer than 24 hours, because you end up using all the RAM. — orokusaki, Oct 07 '14 at 17:37

What are best practices for caching for a "Home Feed" or a "News Feed" in Django

1 Answers1