App Engine: where is 1 second of overhead coming from?

Question

Here's my section of code, logs, as well as an App Stats screenshot. I'm trying to optimize the speed at which a user can get information out of a query really quickly, but I'm getting bogged down by overhead. I'm using an F1 instance here with automatic_scaling, if that helps any. It's really hard to have a responsive app, and I'm not sure what to do to fix performance here. If there's something else I should be doing here, though, I'd really appreciate it if someone could tell me. Thanks.

Code for post request:

@ndb.toplevel
def post(self):
    start_time = time.time()
    search_query = escape(self.request.get('search_query'))
    lat = escape(self.request.get('lat'))
    long = escape(self.request.get('long'))
    query_id = escape(self.request.get('query_id'))
    query_location = escape(self.request.get('location'))
    logging.debug("Getting self vars took %s seconds"%(time.time()-start_time))
    session_id = MD5.new(str(datetime.datetime.now())).hexdigest() # to change later
    logging.debug("To get to MD5 gen took %s seconds"%(time.time()-start_time))

    index = search.Index(name = INDEX_NAME)
    query_string = search_query+" distance(location,geopoint("+lat+","+long+")) < 30000"

    #TODO: this block will need to change if we already have a cursor
    results = index.search(search.Query(
                query_string=query_string,
                options=search.QueryOptions(
                    limit=50,
                    cursor=search.Cursor(),
                    )))
    cursor = results.cursor # if not None, then there are more results

    logging.debug("To get to index search took %s seconds"%(time.time()-start_time))

    search_results_pickle = jsonpickle.encode(results, unpicklable=False)

    logging.debug("To get to pickling took %s seconds"%(time.time()-start_time))

    key_names = []
    for tmpResult in results:
        key_names.append(tmpResult.field("location_id").value)
    # put values into memcache

    logging.debug("To get to database key gen took %s seconds"%(time.time()-start_time))

    ndb.get_multi_async([ndb.Key('locationDatabase', key_name) for key_name in key_names])

    logging.debug("To get to database async lookup took %s seconds"%(time.time()-start_time))

    context = {
        'session_id': session_id,
        'lat': lat,
        'long': long,
        'location':query_location,
        'query_id': query_id,
        'search_results': search_results_pickle
        }

    logging.debug("To get to context generation took %s seconds"%(time.time()-start_time))

    template = template_env.get_template('templates/map_no_jinja.html')

    logging.debug("To get to template HTML gen took %s seconds"%(time.time()-start_time))

    self.response.out.write(template.render(context))

    logging.debug("To get to writing output HTML took %s seconds"%(time.time()-start_time))
    return

Logs: Request took 1867ms according to App Engine logs

D 2014-07-22 14:29:24.112 Getting self vars took 0.00068998336792 seconds
D 2014-07-22 14:29:24.112 To get to MD5 gen took 0.0011100769043 seconds
D 2014-07-22 14:29:24.732 To get to index search took 0.620759963989 seconds
D 2014-07-22 14:29:24.749 To get to pickling took 0.637880086899 seconds
D 2014-07-22 14:29:24.750 To get to database key gen took 0.638570070267 seconds
D 2014-07-22 14:29:24.772 To get to database async lookup took 0.661170005798 seconds
D 2014-07-22 14:29:24.772 To get to context generation took 0.661360025406 seconds
D 2014-07-22 14:29:24.835 To get to template HTML gen took 0.723920106888 seconds
D 2014-07-22 14:29:24.836 To get to writing output HTML took 0.725300073624 seconds

That last logging.debug() statement is right before the return statement.

App Stats: App Stats screenshot

EDIT:

Just for kicks I upped the instance class to F4 and with three instances running (and this was the only request my server was running at the time), I'm still seeing ~1 second overhead that's independent of RPC calls!

enter image description here

Corresponding logs (1893 ms in the header):

D 2014-07-22 15:29:46.261 Getting self vars took 0.000779867172241 seconds
D 2014-07-22 15:29:46.261 To get to MD5 gen took 0.00116991996765 seconds
D 2014-07-22 15:29:46.929 To get to index search took 0.668219804764 seconds
D 2014-07-22 15:29:46.949 To get to pickling took 0.689079999924 seconds
D 2014-07-22 15:29:46.950 To get to database key gen took 0.690039873123 seconds
D 2014-07-22 15:29:46.961 To get to database async lookup took 0.701029777527 seconds
D 2014-07-22 15:29:46.962 To get to context generation took 0.701309919357 seconds
D 2014-07-22 15:29:46.962 To get to template HTML gen took 0.70161986351 seconds
D 2014-07-22 15:29:46.963 To get to writing output HTML took 0.702219963074 seconds

probably still running your async operation? Do you use ndb.toplevel decorator? — marcadian, Jul 22 '14 at 18:47
Yes, see the beginning of the code. It seems hard to believe that a `get_multi` run would take 1 second. — user3058197, Jul 22 '14 at 18:49
I think I was wrong, if it's db call, it should appears on your appstat. I've seen this kind of appstat too, in my experience the gap can be as large as 7seconds (longer than the total request itself if you remove the gap!) — marcadian, Jul 22 '14 at 19:14
Thanks for the reply -- so... what can I do about it then? :) Is this just something that is inherent to App Engine? — user3058197, Jul 22 '14 at 19:18
Actually it's slightly different problem than mine, since in my case it's happening intermittently. But seems in your case you have all the logging which part that takes the 1s. The appstats only records CPU calls. — marcadian, Jul 22 '14 at 20:20
I have to think `logging.debug` is a very very small fraction of that 1s. The only reason I added those steps was to try and diagnose where the lag could have been coming from. — user3058197, Jul 22 '14 at 20:40
oh I misread your log, it said "... " took xxx seconds, which actually duration from the beginning, sorry — marcadian, Jul 22 '14 at 20:59
I would store the futures on the async get and wait for them at the end and then have an additional log message. I bet this is far more expensive than you hope. I suspect that it will show up on appstats at that point. You should be able to add a context that limits the amount of time you are willing to wait. This will prep the caches without slowing you down. — TheJacobTaylor, Jul 23 '14 at 00:30
Thanks for the reply. Can you elaborate on this a bit? 1) "Store the futures" -- do you mean place this in the datastore/memcache somewhere? 2) "Wait for them at the end" -- do you mean use `future_obj.wait()` after my `response`? I was under the impression that this wouldn't help because the browser won't start loading the next page until after the method finishes, right? — user3058197, Jul 23 '14 at 00:38
future_list = ndb.get_multi_async(keys) later you can run "ndb.Future.wait_all(future_list)" to block on it. If you wrap appropriate log messages around there, you should see your delay if this is the cause. It should also show up in appstats. This does not fix your problem but will make sure this is, in fact, your problem. If this is your problem, and you do not need the data, you can setup a context object with a very short timeout. Prime the caches without waiting. To be clear, no bytes of the response get sent to the browser until this function returns. — TheJacobTaylor, Jul 23 '14 at 04:21
Thanks. The `ndb` step is in fact what is causing a lot of delays -- it seems other people are running into [something similar](http://stackoverflow.com/questions/24030855/performance-issues-with-app-engine-memcache-ndb-get-multi/) — user3058197, Jul 23 '14 at 18:13
`you can setup a context object with a very short timeout. Prime the caches without waiting.`Sorry, can you clarify what that means, specifically "prime the caches"? All I'm doing with this query is trying to get data into cache so that I can reference it quickly later with AJAX requests to the server. Would stepping up a context object with a very short timeout put my query results into cache for later? — user3058197, Jul 23 '14 at 18:42
Prime the caches was referring to getting items into cache for fast retrieval later. i.e. you do not need the data now but will soon. If you pass deadline=0.1 as an argument to the ndb_get_multi_async, it should throw and exception and let you resume after 1/10th of a second. This will let you pull some of the data into cache without waiting one second. Given the performance I have been seeing, it looks like you are reading lots of records. Do you need all of these in cache? — TheJacobTaylor, Jul 24 '14 at 20:33

score 0 · Answer 1 · answered Jul 22 '14 at 21:12

0

It looks like the major delay is between a memcache.get and a memcache.set. It looks like this is where you're pickling the search results using jsonpickle. This might be what's taking so long. How much data do you get back from the search results? Try isolating the pickle operation and see how long it takes.

answered Jul 22 '14 at 21:12

bigblind

12,539
14
68
123

I check for time stamps right around the pickling step. Here: `D 2014-07-22 15:29:46.929 To get to index search took 0.668219804764 seconds D 2014-07-22 15:29:46.949 To get to pickling took 0.689079999924 seconds`. So subtracting the two, it's ~21ms. `getsizeof(search_results_pickle)` on a sample run I just did yielded 9,924 bytes. – user3058197 Jul 22 '14 at 21:20

App Engine: where is 1 second of overhead coming from?

1 Answers1