0

I am using Suffix Tree wrapper for python Programmer. https://hkn.eecs.berkeley.edu/~dyoo/python/suffix_trees/

I need the same instance of Suffix tree every time, a views is called in Django. So, I store the Suffix tree instance in django-cache and retrieve it every time when I requires that instance.

Problem 1: When I retrieve it from cache, it always changes memory location. Even when python store data using references.

Problem 2: After 2 retrievals, the python floats a "Segmentation fault (core dumped)"

Ques 1: Why instance of Suffix Tree changes its memory location from cache?

Ques 2: Why it is showing segmentation fault?

Ques 3: Is their any other way to store the persistent instance of Suffix Tree somewhere in django, with same instance?

$ python manage.py shell                        
Python 2.7.5 (default, Mar 22 2016, 00:57:36) 
[GCC 4.7.2 20121109 (Red Hat 4.7.2-8)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> import SuffixTree
>>> d=SuffixTree.SubstringDict()
>>> d["3132"]=1
>>> d["3"]
[1]
>>> d["4343"]=2
>>> d["3"]                                                                     
[1, 2]
>>> from django.core.cache import cache
>>> cache.set("c",d,1000)                                                      
>>> d
<SuffixTree.SubstringDict.SubstringDict instance at 0x27bd830>
>>> cache.get("c")
<SuffixTree.SubstringDict.SubstringDict instance at 0x27ca908>
>>> cache.get("c")
<SuffixTree.SubstringDict.SubstringDict instance at 0x27ca9e0>
>>> cache.get("c")
Segmentation fault (core dumped)
Nakshatra
  • 663
  • 1
  • 6
  • 14

1 Answers1

2

The point of the problem is that Django does not store cache in process memory, so all objects, that you put in cache are serialized before storage and deserialized when you get them back. Every time you retrieve them, the new object, which is a copy of stored object, is created.

It is implemented is such way because in production environment you will have much more than one django worker processes (possibly, running on different servers). And all that worker processes need to share the same cache. So you cannot have the same instance on every request, because you requests can be handled with different workers.

Workaround of this problem will vary depending on the purpose of your app.

According to you comment you can create a module that will cache an instance between requests:

from datetime import timedelta, datetime

MAX_LIVE_TIME = timedelta(seconds=3600)

_tree_instance = None
_tree_timestamp = None

def get_cached_tree():
    global _tree_instance, _tree_timestamp
    if _tree_instance is not None and _tree_timestamp - datetime.now() < MAX_LIVE_TIME:
        return _tree_instance

    _tree_instance = 'SuffixTree' # Replace this with SuffixTree creation
    _tree_timestamp = now()
    return _tree_instance

And then call get_cached_tree() in you views to get SuffixTree. You will still have different instances on different workers but it'll work much faster and have no segfaults

P.S. Segmentation fault is the consequence of a bug in Python interpreter that you use or, which is more likely, a bug of the package you use. You should ensure that you use the last version of the package (https://github.com/JDonner/SuffixTree) and if it doesn't help, you should analyze stacktrace (core dump) and submit a bug to SuffixTree repo.

  • Alright Thanks for the answer. But still there is any way to use same instance of object multiple times? Or store that instance somewhere to use it multiple times.? – Nakshatra Apr 21 '16 at 08:40
  • Getting same problem with the latest package provide by you. – Nakshatra Apr 21 '16 at 08:44
  • You can create and store an instance in some kind of global variable. It is really bad approach, because everything will have access to it. And even than you will have different instances in different workers and you will have to implement their syncing. Maybe I can give you a better advice, if you tell me why do you need the same instance on each response. – Evgeny Barbashov Apr 21 '16 at 08:48
  • I am building a search tool based on Django and Suffix tree. The tool will build a SuffixTree based on the list of strings available in my database. These strings will change, but the frequency of changing the strings is very less. 1-2 in a hour. So, I don't want that my SuffixTree will rebuild for every request every time. Instead, it will use cache for 1 hour and then rebuild. So I tried to store my instance of Suffix tree in cache, but it creates problem as you can see in the description. – Nakshatra Apr 21 '16 at 09:02
  • Why cannot you rebuild the tree on every request? Is it slow or something like that? – Evgeny Barbashov Apr 21 '16 at 09:08
  • It is extremely slow. And request will be around 100 per second. And my database length is around 10k strings. Any how, I need to cache the build. – Nakshatra Apr 21 '16 at 09:10
  • I've updated an answer, check it. I havent checked code, so it can have errors. – Evgeny Barbashov Apr 21 '16 at 09:38
  • Thanks for the answer. Will try your solution, If persist with some problem will again ask you. :) – Nakshatra Apr 21 '16 at 09:41
  • I have a doubt. So where to insert this module? and How it will preserve the script to last for time limit.? – Nakshatra Apr 21 '16 at 09:45
  • You can create a new `cached_tree.py` file in your app directory and import a function with `from .cached_tree import get_cached_tree` in your `views.py`. Django workers, that serve HTTP requests, work as daemons so the will work longer than time limit – Evgeny Barbashov Apr 21 '16 at 09:50
  • But still, every time on new request it is building a new suffix tree. Similar to normal adding it into views.py. – Nakshatra Apr 21 '16 at 10:07
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/109786/discussion-between-evgeny-barbashov-and-nakshatra). – Evgeny Barbashov Apr 21 '16 at 10:11