0

I have a task queue that handles deferred tasks that need to be processed in real time (I have heard this isn't a good idea but can't find any supporting documentation on this.). This queue is hit with an large influx of tasks (2500/min) and then often receives nothing for a minute or two.

queue.yaml

- name: event-message
  rate: 200/s
  bucket_size: 200
  retry_parameters:
      task_retry_limit: 2
      min_backoff_seconds: 1

What I have noticed is that I often get "Process terminated because the request deadline was exceeded. (Error code 123)" messages for some tasks that aren't even started. The task sits in the queue for 10min without being run and then times out. I thought it may be related to my queue configuration so I have included that here.

2015-06-18 10:22:45.842 /_ah/queue/deferred 500 641080ms 0kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=1-0-21
0.1.0.2 - - [18/Jun/2015:07:22:45 -0700] "POST /_ah/queue/deferred HTTP/1.1" 500 0 "http://*******/_ah/queue/deferred" "AppEngine-Google; (+http://code.google.com/appengine)" "********" ms=641081 cpu_ms=0 queue_name=event-message task_name=0477163503045474472 exit_code=123 app_engine_release=1.9.22 instance=00c61b117c8af1f4df1247032afd14111a19f302
E 2015-06-18 10:22:45.842
Process terminated because the request deadline was exceeded. (Error code 123)

Can anyone explain this? Any ideas how I can figure out why these tasks are never run?

So I was looking at the logs more and found additional support for my theory that the tasks weren't even starting. Here the "This request caused a new process to be started for your application" appears after the task 10min deadline has been reached.

2015-06-24 17:16:59.554 /_ah/queue/deferred 500 641631ms 0kb AppEngine-Google; (+http://code.google.com/appengine) module=default version=1-0-27
0.1.0.2 - - [24/Jun/2015:14:16:59 -0700] "POST /_ah/queue/deferred HTTP/1.1" 500 0 "https://integrate.versature.com/api/netsapiens_events/96615760-cac7-40d6-b80d-13a1ffade2c8/" "AppEngine-Google; (+http://code.google.com/appengine)" "integrate.versature.com" ms=641632 cpu_ms=0 queue_name=event-message task_name=59067488044178506841 pending_ms=1061 exit_code=123 app_engine_release=1.9.22 instance=00c61b117c8f2db4ade6be8d30165cb84f0a17ab
E 2015-06-24 17:16:59.554
Process terminated because the request deadline was exceeded. (Error code 123)

Here is another:

I 2015-06-23 13:48:22.970
X-Appengine-Taskretrycount:0, X-Appengine-Tasketa:1435081701.5576639, X-Appengine-Default-Namespace:versature.com, X-Appengine-Taskname:89115825957976957021, X-Appengine-Taskexecutioncount:0, X-Appengine-Queuename:event-message, X-Appengine-Current-Namespace:, X-Appengine-Country:ZZ

E 2015-06-23 13:58:21.669
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 267, in Handle
result = handler(dict(self._environ), self._StartResponse)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 310, in post
self.run_from_request()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 305, in run_from_request
run(self.request.body)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/ext/deferred/deferred.py", line 147, in run
return func(*args, **kwds)
File "/base/data/home/apps/s~versature-integrate-2/1-0-26.385226646541088482/app/api/resources/constructs/netsapiens_event/process_netsapiens_event.py", line 66, in process_data_for_endpoint
logging.info("Process data for specific endpoint. Endpoint Key: %s", endpoint_key_url_safe)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/logging/__init__.py", line 1611, in info
root.info(msg, *args, **kwargs)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/logging/__init__.py", line 1149, in info
self._log(INFO, msg, args, **kwargs)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/logging/__init__.py", line 1268, in _log
self.handle(record)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/logging/__init__.py", line 1278, in handle
self.callHandlers(record)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/logging/__init__.py", line 1318, in callHandlers
hdlr.handle(record)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/logging/__init__.py", line 749, in handle
self.emit(record)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/app_logging.py", line 79, in emit
self._AppLogsLocation())
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/logservice/logservice.py", line 457, in write_record
logs_buffer().write_record(level, created, message, source_location)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/logservice/logservice.py", line 270, in write_record
message = cleanup_message(message)
DeadlineExceededError

I 2015-06-23 13:58:21.684
This request caused a new process to be started for your application, and thus caused your application code to be loaded for the first time. This request may thus take longer and use more CPU than a typical request for your application.
David Ward
  • 493
  • 3
  • 15
  • How did you conclude that the task isn't running? The log shows the request taking 641080ms which is a bit over the 10 min task queue execution time limit - probably when it was terminated. Maybe you're attempting to do too much stuff in one task or some endless loop is lurking in that stuff. – Dan Cornilescu Jun 19 '15 at 02:07
  • I concluded that the task isn't running as none of the logging messages associated with that deferred task ever appear. The only thing I get for these tasks is the Process terminated 10min later. I assumed that even if the process terminated I would still see my log messages. Is this not true? What do you mean by too much stuff? In general the deferred task executes within a second (except for the few that timeout). – David Ward Jun 19 '15 at 02:39
  • An additional thought. The deferred task is passed a complex object that lazily loads and prepares a result. I guess it is possible there is something in there that causing an issue but without logging it's hard to know what. – David Ward Jun 19 '15 at 02:40
  • The lack of logs *might* be an indication of something going wrong during processing. Maybe related - this Q&A suggests the deferred library might not be a good fit for high throughput apps: http://stackoverflow.com/questions/4909278/google-app-engine-use-task-queues-or-deferred-jobs. Just quessing, tho. – Dan Cornilescu Jun 19 '15 at 03:27
  • Ok so I moved away from the deferred library. I am using the taskqueue library myself now. So that should allow for a higher throughput as documented by google. I am having the exact same issue though. I end up with 7 or 8 tasks that don't seem to start. These tasks just sit in the queue and then time out after 10 minutes. Any ideas? – David Ward Jun 22 '15 at 15:36
  • This makes me supect a bug even more. Are those 7-8 tasks a tiny minority of the tasks being processed (while the others being processed as expected)? If so - can you isolate the 7-8 ones and repro only with them? Might alo help is you post the task queue handler code – Dan Cornilescu Jun 22 '15 at 17:41
  • Yes the others are being processed perfectly. The 7-8 tasks just don't start. I could try to capture the 2500 ish tasks that go through in one shot and see about replaying them to see if it's the same ones each time or random tasks. I will give that a shot. If that doesn't turn up anything I will post the task queue handler code. – David Ward Jun 22 '15 at 18:04
  • I am thinking at this point that maybe new instances of my app are being initialized but they are loading too slowly. I'm going to pursue this theory and see where it takes me. – David Ward Jun 24 '15 at 15:16
  • Yes, it's a possibility as well, mentioned in this guide: https://cloud.google.com/appengine/articles/deadlineexceedederrors. Logging itself is listed, too (seeing it in the traceback you added). – Dan Cornilescu Jun 24 '15 at 15:26
  • Yes, I was just reading that guide. I removed the logging and uploaded a new version of the app. I'm waiting to see if that has any effect on the DeadlineErrors. I'll post what I find out. – David Ward Jun 24 '15 at 15:27
  • The strange thing is that these Deadline Errors are being raised on tasks starting from a Task Queue. They should have 10 minutes in which to run. Yet they seem to be stuck in the pending request queue. – David Ward Jun 24 '15 at 15:43
  • A closer look at your traceback snippet shows an `I` msg at 13:48:22.970 showing the start of the task, followed by the `E` msg preceeding the actual traceback at 13:58:21.669 (i.e. ~10 min later). And the traceback includes this call from the `process_netsapiens_event.py` file (which I *think* is from your app, if true then it is proof that your code actually executes): `logging.info("Process data for specific endpoint. Endpoint Key: %s", endpoint_key_url_safe)` – Dan Cornilescu Jun 25 '15 at 04:57
  • Yes it did run in this instance. But this is the only one I found like this. Every other failure just has the E DeadlineExceededError with no I messages at all. I added such an example to the description above. – David Ward Jun 25 '15 at 12:22
  • See my 1st comment, 641631ms in this last snippet. I think the only difference between the 2 cases is that in one the app manages to catch the DeadlineExceededError exception and print the traceback while in the other it doesn't. Per that guide: `If the DeadlineExceededError is caught but a response is not produced quickly enough (you have less than a second), the request is aborted and a 500 internal server error is returned` – Dan Cornilescu Jun 25 '15 at 13:18
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/81557/discussion-between-david-ward-and-dan-cornilescu). – David Ward Jun 25 '15 at 16:09

0 Answers0