5

My django application deals with 25MB binary files. Each of them has about 100,000 "records" of 256 bytes each.

It takes me about 7 seconds to read the binary file from disk and decode it using python's struct module. I turn the data into a list of about 100,000 items, where each item is a dictionary with values of various types (float, string, etc.).

My django views need to search through this list. Clearly 7 seconds is too long.

I've tried using django's low-level caching API to cache the whole list, but that won't work because there's a maximum size limit of 1MB for any single cached item. I've tried caching the 100,000 list items individually, but that takes a lot more than 7 seconds - most of the time is spent unpickling the items.

Is there a convenient way to store a large list in memory between requests? Can you think of another way to cache the object for use by my django app?

phugoid
  • 85
  • 1
  • 6
  • is using numpy.array possible? – James R Aug 08 '12 at 23:09
  • 5
    Have you considered using a database? – Ignacio Vazquez-Abrams Aug 08 '12 at 23:17
  • James R: I can't see the connection between numpy.array and my caching problem. Ignacio: Yes, that could work - I'd store each of the 100,000 items as a django model instance, and use the ORM to recall the data on subsequent requests... But I was hoping to find a simpler memory-based solution. – phugoid Aug 08 '12 at 23:42
  • numpy.array tends to be efficient when storing large arrays of numbers. If you aren't storing numbers, than it's not something that would be useful – James R Aug 09 '12 at 01:58
  • Also, if it's flat, you should might consider using mongo or casandra. – James R Aug 09 '12 at 01:59

2 Answers2

9

edit the item size limit to be 10m (larger than 1m), add

-I 10m

to /etc/memcached.conf and restart memcached

also edit this class in memcached.py located in /usr/lib/python2.7/dist-packages/django/core/cache/backends to look like this:

class MemcachedCache(BaseMemcachedCache):
"An implementation of a cache binding using python-memcached"
def __init__(self, server, params):
    import memcache
    memcache.SERVER_MAX_VALUE_LENGTH = 1024*1024*10 #added limit to accept 10mb
    super(MemcachedCache, self).__init__(server, params,
                                         library=memcache,
                                         value_not_found_exception=ValueError)
FizxMike
  • 971
  • 1
  • 10
  • 16
  • This is on Ubuntu 12.04 LTS of course... file locations may vary for your distro. – FizxMike Mar 13 '13 at 11:05
  • 2
    Just wasted several hours trying to work this out... couldn't figure out if memcache or Django was to blame, then realised it was both! Cheers – Alex Aug 11 '13 at 06:25
  • See also http://stackoverflow.com/questions/16490819/how-to-tell-django-that-memcached-running-with-item-size-larger-than-default – Alex Aug 12 '13 at 04:52
  • Actually, this did NOT work for me due to unexpected behaviour from python-memcache, which I've just reported. See https://github.com/linsomniac/python-memcached/issues/13 – Alex Aug 12 '13 at 04:58
  • This doesn't work for me either. I fixed it using pylibmc, instead of python-memcache. – sefakilic Oct 21 '13 at 21:37
  • This is no longer necessary, and can be done from settings, see https://stackoverflow.com/a/66762884/548736 – andyhasit Mar 23 '21 at 12:13
4

I'm not able to add comments yet, but I wanted to share my quick fix around this problem, since I had the same problem with python-memcached behaving strangely when you change the SERVER_MAX_VALUE_LENGTH at import time.

Well, besides the __init__ edit that FizxMike suggests you can also edit the _cache property in the same class. Doing so you can instantiate the python-memcached Client passing the server_max_value_length explicitly, like this:

from django.core.cache.backends.memcached import BaseMemcachedCache

DEFAULT_MAX_VALUE_LENGTH = 1024 * 1024

class MemcachedCache(BaseMemcachedCache):
    def __init__(self, server, params):
        #options from the settings['CACHE'][connection]
        self._options = params.get("OPTIONS", {})
        import memcache
        memcache.SERVER_MAX_VALUE_LENGTH = self._options.get('SERVER_MAX_VALUE_LENGTH', DEFAULT_MAX_VALUE_LENGTH)

        super(MemcachedCache, self).__init__(server, params,
                                             library=memcache,
                                             value_not_found_exception=ValueError)

    @property
    def _cache(self):
        if getattr(self, '_client', None) is None:
            server_max_value_length = self._options.get("SERVER_MAX_VALUE_LENGTH", DEFAULT_MAX_VALUE_LENGTH)
            #one could optionally send more parameters here through the options settings,
            #I simplified here for brevity
            self._client = self._lib.Client(self._servers,
                server_max_value_length=server_max_value_length)

        return self._client

I also prefer to create another backend that inherits from BaseMemcachedCache and use it instead of editing django code.

here's the django memcached backend module for reference: https://github.com/django/django/blob/master/django/core/cache/backends/memcached.py

Thanks for all the help on this thread!

Morvader
  • 2,317
  • 3
  • 31
  • 44
kibe
  • 41
  • 3