1

I have deployed a website on Google-App-Engine with Python.

Since GAE does not guarantee "keep-alive", I have implemented a stateless server:

  1. Upon every change in internal variables, they are immediately stored into GQL data-base.

  2. Whenever the process is started, all internal variables are loaded from GQL data-base.

I have a scenario which rarely throws an exception, and I have not been able to track it down:

  • Client sends a synchronous AJAX POST request.

  • Server creates a session and sends a unique session-ID in the response.

  • Client sends a synchronous AJAX GET request with the session-ID as an argument.

  • Server sends some text message in the response.

Since the client requests are synchronous, the entire sequence is also synchronous.

Here is the relevant mapping within my server:

from webapp2 import WSGIApplication
from Handler import MyRequestHandler

app = WSGIApplication([
    ('/request1'    ,MyRequestHandler), # post request
    ('/request2(.*)',MyRequestHandler), # get request
])

Here is the relevant request-handling within my server:

from webapp2 import RequestHandler
from Server  import MyServer

myServer = MyServer()

class MyRequestHandler(RequestHandler):
    def post(self):
        try:
            if self.request.path.startswith('/request1'):
                sessionId = myServer.GetNewSessionId()
                self.SendContent('text/plain',sessionId)
        except Exception,error:
            self.SendError(error)
    def get(self,sessionId):
        try:
            if self.request.path.startswith('/request2'):
                textMessage = myServer.GetMsg(sessionId)
                self.SendContent('text/plain',textMessage)
        except Exception,error:
            self.SendError(error)
    def SendContent(self,contentType,contentData):
        self.response.set_status(200)
        self.response.headers['content-type'] = contentType
        self.response.headers['cache-control'] = 'no-cache'
        self.response.write(contentData)
    def SendError(self,error):
        self.response.set_status(500)
        self.response.write(error.message)

Here is the internal implementation of my server:

class MyServer():
    def __init__(self):
        self.sessions = SessionsTable.ReadSessions()
    def GetNewSessionId(self):
        while True:
            sessionId = ... # a 16-digit random number
            if SessionsTable.ReserveSession(sessionId):
                self.sessions[sessionId] = ... # a text message
                SessionsTable.WriteSession(self.sessions,sessionId)
                return sessionId
    def GetMsg(self,sessionId):
        return self.sessions[sessionId]

And finally, here is the data-base maintenance within my server:

from google.appengine.ext import db

class SessionsTable(db.Model):
    message = db.TextProperty()
    @staticmethod
    def ReadSessions():
        sessions = {}
        for session in SessionsTable.all():
            sessions[session.key().name()] = session.message
        return sessions
    @staticmethod
    @db.transactional
    def ReserveSession(sessionId):
        if not SessionsTable.get_by_key_name(sessionId):
            SessionsTable(key_name=sessionId,message='').put()
            return True
        return False
    @staticmethod
    def WriteSession(sessions,sessionId):
        SessionsTable(key_name=sessionId,message=sessions[sessionId]).put()
    @staticmethod
    def EraseSession(sessionId):
        SessionsTable.get_by_key_name(sessionId).delete()

The exception itself indicates an illegal access to the sessions dictionary using the sessionId key. From my observation, it occurs only when the client-server sequence described at the beginning of this question is initiated after the server "has been sleeping" for a relatively long period of time (like a few days or so). It may provide some sort of clue to the source of this problem, though I am unable to see it.

My questions:

  1. Is there anything visibly wrong with my design?

  2. Has anyone experienced a similar problem on GAE?

  3. Does anyone see an obvious solution, or even a debugging method that might help to understand this problem?

Thanks

barak manos
  • 29,648
  • 10
  • 62
  • 114
  • Use of class level variables is probably a bad idea. If you have threading enabled then you could be leaking info. – Tim Hoffman Jan 23 '14 at 05:53
  • Are you referring to `class SessionsTable`? I don't have instances of this class, so there shouldn't be any problem here, right? Or did you mean that there might be a problem with the global variable `myServer`, which is not protected properly? I am not using any threads, so do I have a reason to "fear" that GAE creates them implicitly? Thanks – barak manos Jan 23 '14 at 06:13
  • All of those things could be a problem. If you have threading enabled then appengine will implicitly create threads to serve each request (if the instance is responding quickly). So your must design your app to be thread safe. Never store things at the module level, or as a class attribute unless it can be shared. ie it's ok as an instance level cache for shared data. – Tim Hoffman Jan 23 '14 at 08:45

1 Answers1

3

You are wrongly assuming that all requests are handled by the same instance. This isn't at all the case: GAE, like most hosting enviroments, makes no guarantee about which server process will handle any request.

Your implementation is using a module-level variable, myServer, which is a class instance with its own instance variables. But each server process will have its own instance of myServer, and they're not shared between processes: so a dictionary entry that's created in one request won't necessarily exist in a second.

You'll need to look at ways of persisting this data across instances. Really, that's what the datastore is for in first place; if you're worried about overhead, you should investigate using memcache.

Daniel Roseman
  • 588,541
  • 66
  • 880
  • 895
  • thanks. I did take in account the possibility of several server instances running concurrently, but I didn't add it up to the case where one server would handle the first (POST) request and another server **which is already alive** would handle the second (GET) request. I defined the client requests to be synchronous, and I guess that I've implicitly assumed a "synchronous" behavior on the server side as well. – barak manos Jan 22 '14 at 16:22
  • One question though: the server is very rarely "interrupted" (my website is not that popular just yet), and if anything, I had to deal with the case of 0 server instances at a certain point (which has led me to the 'stateless' design). Why would GAE create two instances for an application that is hardly serving anyone? The reason I'm asking this is, I suspect there might be yet another problem at hand. Thanks. – barak manos Jan 22 '14 at 16:25
  • 1
    It happens all the time, they fire up one instance, a second request comes in during startup, and is not handled then a second one starts up. then who knows which instance will service the next request. – Tim Hoffman Jan 23 '14 at 05:51