Accessing Johnny Cache Data

Question

I am relatively new to Python and was wondering if it is possible to access the results cached by Johnny Cache for further processing before returning the result e.g. running further queries on it.

As a simplified example, consider we have a table with hundreds of thousands sport results each categorised by a sport e.g. tennis, football, golf, etc. Some users are only interested in football & golf, so at the moment we use johnny cache to cache the results of a query for each sport category for 30 mins. However, we cant pass this data as is to the user, as it requires further filtering for the user's preferences (e.g. they only want results for certain teams/players). Doing a db call for the category and user preferences would be prohibitive which is why we cache the part of the query (the sport category) which forms the base of all requests, but now want to filter that in-memory cache further for the user's preferences - can this be done with Johnny Cache and if so, how please?

Are you saying that you want to also cache the user specific result? If so, why not just make that the original query (I know that you said that would be prohibitive, but doing it in that roundabout way would seem much more prohibitive to me, although I don't know the specifics of your database indexes, size, etc)? If not, I don't really see the problem because you would always be modifying the cached sports query on the fly anyways. — ubomb, Nov 01 '13 at 15:59
@ubomb I dont wat to cache the user specific result. I want to query the cached general results. — RunLoop, Nov 01 '13 at 16:09
Right. So since Johnny Cache would automatically cache the sports query, wouldn't this be just a question on how to filter the resulting set from that query - entirely independent of Johnny Cache? In which case, again, more would have to be known about the actual database. — ubomb, Nov 01 '13 at 16:18
Is there something else you are looking for that isn't answered? QuerySet methods such as `filter` modify the sql that django generates to perform the query. If you want to avoid a database hit and process it further, you'll need to do it outside django's ORM. — Tim Edgar, Nov 05 '13 at 03:27

score 5 · Accepted Answer · answered Nov 01 '13 at 17:01

The short answer is yes, but you won't be able to use QuerySet filters without causing another database call. You'll need to iterate through the returned results to avoid a database hit. It depends on whether you want to do this or not based on the size of the returned results and the query time for the new filtered query.

As mentioned in the QuerySet documentation, a filtered QuerySet returns a new QuerySet that isn't bound by the original.

To understand the situation further, you can look at the signals johnny.signals.qc_hit and johnny.signals.qc_miss to see when it is making a database call. Signals are a django mechanism to bind callbacks to certain events. In this case, Johnny Cache exposes these two useful signals.

I created a simple application to test it out and to help demonstrate this behavior.

models.py

from django.db import models

class TestModel(models.Model):
    prop_a = models.TextField()
    prop_b = models.TextField()

    def __unicode__(self):
        return "{} {}".format(self.prop_a, self.prop_b)

views.py

from django.dispatch import receiver
from django.http import HttpResponse

from johnny.signals import qc_hit, qc_miss
from models import TestModel

def index(self):
    objs = TestModel.objects.all()
    print objs
    print objs.filter(prop_a='a') #Causes another database or cache hit
    return HttpResponse("success")

def generate(self):
    generate_data()
    return HttpResponse("generated")

def generate_data():
    properties = [ 'a', 'b', 'c', 'd', 'e']
    for i in xrange(len(properties)):
        for j in xrange(len(properties)):
            test_model = TestModel(prop_a=properties[i], prop_b=properties[j])
            test_model.save()

@receiver(qc_hit)
def cache_hit(sender, **kwargs):
    print "cache hit"

@receiver(qc_miss)
def cache_miss(sender, **kwargs):
    print "cache miss"

As Johnny Cache is done through a middleware, you'll need to test it through a view since it happens from request to response. In the case above, we have a very simple model that we're looking at all the TestModel objects and then a filtered result. The output will show the each one initially causing a cache miss and then subsequently a cache hit. They aren't related and are considered two separate queries.

However, if you do something like

objs = TestModel.objects.all()
result = []
for obj in objs:
   if obj.prop_a == 'a':
      result.append(obj)

You'll only see one hit to the database / johnny cache. Obviously this gets your desired result, but may or may not be slower than another query depending on the size of the initial query.

I hope this helps answer your question as well as give you an approach to understand how the caching works further.

Accessing Johnny Cache Data

1 Answers1