Asynchronous URLfetch when we don't care about the result? [Python]

Question

In some code I'm writing for GAE I need to periodically perform a GET on a URL on another system, in essence 'pinging' it and I'm not terribly concerned if the request fails, times out or succeeds.

As I basically want to 'fire and forget' and not slow down my own code by waiting for the request, I'm using an asynchronous urlfetch, and not calling get_result().

In my log I get a warning:

Found 1 RPC request(s) without matching response (presumably due to timeouts or other errors)

Am I missing an obviously better way to do this? A Task Queue or Deferred Task seems (to me) like overkill in this instance.

Any input would appreciated.

You should instrument to determine whether the URLFetch operation is completing, and if it does, whether that happens before or after your request returns to the caller. I think you will find that it *does* finish, and that the `wait` happens implicitly *after* the original request returns its result. However, I haven't found explicit documentation for this anywhere, so the behavior may be subject to change. — technomage, Jan 18 '14 at 14:02

score 7 · Accepted Answer · answered Mar 23 '11 at 22:55

A task queue task is your best option here. The message you're seeing in the log indicates that the request is waiting for your URLFetch to complete before returning, so this doesn't help. You say a task is 'overkill', but really, they're very lightweight, and definitely the best way to do this. Deferred will even allow you to just defer the fetch call directly, rather than having to write a function to call.

kevpie · Answer 2 · 2011-03-24T00:38:54.540

How long does it take for the async_url_fetch to complete and how long does it take to provide your response?

Here is a possible approach to leverage the way the api works in python.

Some points to consider.

Many webservers and reverse proxies will not cancel a request once it has been started. So if your remote server you are pinging cues the request but takes a long time to service it, use a deadline on your create_rpc(deadline=X) such that X will return due to timeout. The ping may still succeed. This technique works against appengine itself as well.
GAE RPCs
- RPCs after being cued via make_call/make_fetch_call are actually only dispatched once one of them is waited on.
- Also any just finished rpc will have its callback called when the currently waited on one finishes.
- You can create an async_urlfetch rpc and enqueue it using make_fetch_call as early as possible in handling your request, don't wait on it yet.
- Do the actual page serving work, like memcache/datastore calls to get the work going. The first call to one of this will perform a wait which will dispatch your async_urlfetch.
- If the urlfetch completes during this other activity the callback on the urlfetch will be called, allowing you to do handle the result.
- If you do call get_result() it will block on wait() till the deadline or it returns unless the result is ready.

To recap.

Prepare the long running url_fetch with a reasonable deadline and callback. Enqueue it using make_fetch_call. Do the work you wanted to for the page. Return the page regardless of wether the url_fetch completed or deadlined and without waiting for it.

The underlying RPC layer in GAE is all asynchronous, there seems to be a more sophisticated way to choose what you wish to wait on in the works.

These examples use sleep and a url_fetch to a second instance of the same app.

Example of wait() dispatching rpc work:

class AsyncHandler(RequestHandler):

    def get(self, sleepy=0.0):
        _log.info("create rpc")
        rpc = create_rpc()
        _log.info("make fetch call")
        # url will generate a 404
        make_fetch_call(rpc, url="http://<my_app>.appspot.com/hereiam")
        _log.info("sleep for %r", sleepy)
        sleep(sleepy)
        _log.info("wait")
        rpc.wait()
        _log.info("get_result")
        rpc.get_result()
        _log.info("return")
        return "<BODY><H1>Holla %r</H1></BODY>" % sleepy

Wait called after sleeping for 4 seconds shows dispatch of

2011-03-23 17:08:35.673 /delay/4.0 200 4093ms 23cpu_ms 0kb Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.151 Safari/534.16,gzip(gfe)
I 2011-03-23 17:08:31.583 create rpc
I 2011-03-23 17:08:31.583 make fetch call
I 2011-03-23 17:08:31.585 sleep for 4.0
I 2011-03-23 17:08:35.585 wait
I 2011-03-23 17:08:35.663 get_result
I 2011-03-23 17:08:35.663 return
I 2011-03-23 17:08:35.669 Saved; key: __appstats__:011500, part: 48 bytes, full: 4351 bytes, overhead: 0.000 + 0.006; link: http://<myapp>.appspot.com/_ah/stats/details?tim
2011-03-23 17:08:35.636 /hereiam 404 9ms 0cpu_ms 0kb AppEngine-Google; (+http://code.google.com/appengine; appid: s~<myapp>),gzip(gfe)

Async dispatched call.

E 2011-03-23 17:08:35.632 404: Not Found Traceback (most recent call last): File "distlib/tipfy/__init__.py", line 430, in wsgi_app rv = self.dispatch(request) File "di
I 2011-03-23 17:08:35.634 Saved; key: __appstats__:015600, part: 27 bytes, full: 836 bytes, overhead: 0.000 + 0.002; link: http://<myapp>.appspot.com/_ah/stats/details?time

Showing using a memcache RPC's wait to kick off the work.

class AsyncHandler(RequestHandler):

    def get(self, sleepy=0.0):
        _log.info("create rpc")
        rpc = create_rpc()
        _log.info("make fetch call")
        make_fetch_call(rpc, url="http://<myapp>.appspot.com/hereiam")
        _log.info("sleep for %r", sleepy)
        sleep(sleepy)
        _log.info("memcache's wait")
        memcache.get('foo')
        _log.info("sleep again")
        sleep(sleepy)
        _log.info("return")
        return "<BODY><H1>Holla %r</H1></BODY>" % sleepy

Appengine Prod Log:

2011-03-23 17:27:47.389 /delay/2.0 200 4018ms 23cpu_ms 0kb Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_7; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.151 Safari/534.16,gzip(gfe)
I 2011-03-23 17:27:43.374 create rpc
I 2011-03-23 17:27:43.375 make fetch call
I 2011-03-23 17:27:43.377 sleep for 2.0
I 2011-03-23 17:27:45.378 memcache's wait
I 2011-03-23 17:27:45.382 sleep again
I 2011-03-23 17:27:47.382 return
W 2011-03-23 17:27:47.383 Found 1 RPC request(s) without matching response (presumably due to timeouts or other errors)
I 2011-03-23 17:27:47.386 Saved; key: __appstats__:063300, part: 66 bytes, full: 6869 bytes, overhead: 0.000 + 0.003; link: http://<myapp>.appspot.com/_ah/stats/details?tim
2011-03-23 17:27:45.452 /hereiam 404 10ms 0cpu_ms 0kb AppEngine-Google; (+http://code.google.com/appengine; appid: s~<myapp>),gzip(gfe)

Async url fetch dispatched when memcache.get calls wait()

E 2011-03-23 17:27:45.446 404: Not Found Traceback (most recent call last): File "distlib/tipfy/__init__.py", line 430, in wsgi_app rv = self.dispatch(request) File "di
I 2011-03-23 17:27:45.449 Saved; key: __appstats__:065400, part: 27 bytes, full: 835 bytes, overhead: 0.000 + 0.002; link: http://<myapp>.appspot.com/_ah/stats/details?time

This is not a simple solution, but is intended to provide some food for thought. — kevpie, Mar 23 '11 at 23:10
Good point about the timeouts. I'm fairly certain you're wrong about RPCs only being dispatched once wait is called, though - that only applies in the dev_appserver to the best of my knowledge. — Nick Johnson, Mar 23 '11 at 23:11
@Nick, I had to double check. The documentation in the source may currently be predicting the future or slightly misleading. This is something almost nobody would come up against unless they are seriously down in the weeds of doing async api calls. I had a dream after building asynctools to create an async page fragment resolver that did everything in parallel. A year ago I was working on a tool that made dozens of signed api calls between facebook/twitter/linkedin with multiple oauth key/public fallbacks in real time. The parallelism that the GAE RPC layer provides is simply astounding. — kevpie, Mar 24 '11 at 00:46
@John, This is really out of control now. Use Nick's suggestion, if it is occasional, Deferred it. If it is perodic, Cron it. — kevpie, Mar 24 '11 at 00:56
Thanks Kev, good stuff here. Really appreciate the effort you put into your answer. — John Carter, Mar 24 '11 at 04:10
The point about RPCs only being dispatched once `get_result()` or `wait` is called is incorrect. — technomage, Jan 18 '14 at 13:48
@technomage At that time (2009-2011) it was single threaded, those where the points at which control was passed to the RPC layer. I can't speak to how it works now. — kevpie, Jan 18 '14 at 20:47

Asynchronous URLfetch when we don't care about the result? [Python]

2 Answers2

Linked