3

I had my django application configured with memcached and everything was working smoothly.

I am trying to populate the cache over time, adding to it as new data comes in from external API's. Here is the gist of what I have going on:

main view

api_query, more_results = apiQuery(**params)
cache_key = "mystring"
cache.set(cache_key, data_list, 600)

if more_results:
    t = Thread(target = 'apiMoreResultsQuery', args = (param1, param2, param3))
    t.daemon = True
    t.start()

more results function

cache_key = "mystring"
my_cache = cache.get(cache_key)
api_query, more_results = apiQuery(**params)
new_cache = my_cache + api_query
cache.set(cache_key, new_cache, 600)

if more_results:
    apiMoreResultsQuery(param1, param2, param3)

This method works for several iterations through the apiMoreResultsQuery but at some point the cache returns None causing the whole loop to crash. I've tried increasing the cache expiration but that didn't change anything. Why would the cache be vanishing all of a sudden?

For clarification I am running the apiMoreResultsQuery in a distinct thread because I need to return a response from the initial call faster then the full data-set will populate so I want to keep the populating going in the background while a response can still be returned.

apardes
  • 4,272
  • 7
  • 40
  • 66

1 Answers1

3

When you set a particular cache key and the item you are setting is larger than the size allotted for a cached item, it fails silently and your key gets set to None. (I know this because I have been bitten by it.)

Memcached uses pickle to cache objects, so at some point new_cache is getting pickled and it's simply larger than the size allotted for cached items.

The memcached default size is 1MB, and you can increase it, but the bigger issue that seems a bit odd is that that you are using the same key over and over again and your single cached item just gets bigger and bigger.

Wouldn't a better strategy be to set new items in the cache and to be sure that those items are small enough to be cached?

Anyway, if you want to see how large your item is growing, so you can test whether or not it's going to go into the cache, you can do some of the following:

>>> import pickle
>>> some_object = [1, 2, 3]
>>> len(pickle.dumps(some_object, -1))
22
>>> new_object = list(range(1000000))
>>> len(pickle.dumps(new_object, -1))
4871352   # Wow, that got pretty big!

Note that this can grow a lot larger if you are pickling Django model instances, in which case it's probably recommended just to pickle the values you want from the instance.

For more reading, see this other answer:

How to get the size of a python object in bytes on Google AppEngine?

Community
  • 1
  • 1
erewok
  • 7,555
  • 3
  • 33
  • 45
  • Thanks, that clarified things for me. Just for posterity, I was able to correct the issue by adding `-I 3M` to `/etc/memcached.conf` and restarting memcached. This increased the max object size allowed to 3MB which should be sufficient for my needs. – apardes Jul 01 '15 at 20:37
  • Is there any reason this would revert overnight? Had it working no problem yesterday but now I'm back to square one. Getting `None` when I try to pull the cache. – apardes Jul 02 '15 at 20:53
  • Not that I know of. I would first suspect the size of the item, based on the code you originally posted. It's really easy for some aspect of it to result in a huge item. That's also why I never rely on adjusting default cache size: I always check the sizes of items when their sizes are variable. – erewok Jul 02 '15 at 20:55
  • I have been checking the size, it seems to be happening right as the size reaches 1MB again even though the limit is increased. This was definitely working yesterday. I'm quite confused. – apardes Jul 02 '15 at 20:56