Django Cache: Caching thousands of queries for a long time, on basic server resources

Question

I am building a website having in my mind that hundreds (I wish thousands!) of 'get' queries -per day- will be cached for a couple of months in the filesystem.

Reading the cache documentation, however, I observe that the default values lean towards a small and fast cache cycle.

An old post describes that a strategy like the one I imagine, wrecked havoc in their servers.

Of course, the current django code seems to have evolved since 2012. However the cache defaults still remain the same...

I wonder whether I am on the right track or not.

My familiarity with caching is restricted in enjoying the W3 Total Cache results after saving thousands of files in the relevant directories without understanding anything but its basic settings.

How would an experienced developer approach "stage 1" of this task:

~~Without the budget -yet- to support solutions based on Redis (for example)~~ (Not a valid argument)

How would you cache a normally augmenting number of queries -capable to form a bulk- for a long period of time, running on rather basic server resources?

score 1 · Accepted Answer · answered Feb 28 '16 at 00:51

Django's cache backend *should be implementation agnostic. For example, if you want to start with filesystem cache or redis cache or memcache it shouldn't really matter to django.

I can think of a couple issues with your approach:

how fast is your dataset growing? If you have pretty stable sized dataset, it shoudn't matter if the cache entries are long-lived.
how will you invalidate your queries? if the queries are being cached for months, it suggests that data does not change; cache invalidation is a big thing to consider, clients shouldn't see stale data.
are you using Filesystem cache? if data is being cached per server, are requests being consitantly assigned to the same servers? If not then multiple servers can have duplicate caches, this is one of the benefits of using a centralized cache (redis/memcache)
you should be able to calculate a pretty good estimate based on your current dataset size, how much data you'd like to cache, and the rate of growth of your data on how large of cache you'd need. I feel like a shared cache will go very far, and can be ran on "basic server" resources.

For stage 1, i would:

choose a shared cache, either redis or memcached, this should be a lot less painful when you start to scale to multi server setups
estimate how much data you will need to cache, and what sort of data size growth you predict, to make sure your cache is of an appropriate size.
I feel like cache invalidation is usually not a set policy on how long the data should persist in the cache, it is governed by when your data changes, which should force invalidate the cache so that clients don't see stale data

I see the big picture. I will stay and enjoy 'stage 0', try to get some benchmarks and then consider a sound and well based 'stage 1' -as needed- in the context of your approach. Thank you! — raratiru, Feb 28 '16 at 10:51

Django Cache: Caching thousands of queries for a long time, on basic server resources

1 Answers1