6

On my local machine the script runs fine but in the cloud it 500 all the time. This is a cron task so I don't really mind if it takes 5min...

< class 'google.appengine.runtime.DeadlineExceededError' >:

Any idea whether it's possible to increase the timeout?

Thanks, rui

rui
  • 11,015
  • 7
  • 46
  • 64

3 Answers3

10

You cannot go beyond 30 secs, but you can indirectly increase timeout by employing task queues - and writing task that gradually iterate through your data set and processes it. Each such task run should of course fit into timeout limit.

EDIT

To be more specific, you can use datastore query cursors to resume processing in the same place:

http://code.google.com/intl/pl/appengine/docs/python/datastore/queriesandindexes.html#Query_Cursors

introduced first in SDK 1.3.1:

http://googleappengine.blogspot.com/2010/02/app-engine-sdk-131-including-major.html

Tomasz Zieliński
  • 16,136
  • 7
  • 59
  • 83
9

The exact rules for DB query timeouts are complicated, but it seems that a query cannot live more than about 2 mins, and a batch cannot live more than about 30 seconds. Here is some code that breaks a job into multiple queries, using cursors to avoid those timeouts.

def make_query(start_cursor):
  query = Foo()

  if start_cursor:
    query.with_cursor(start_cursor)

  return query

batch_size = 1000
start_cursor = None

while True:
  query = make_query(start_cursor)
  results_fetched = 0

  for resource in query.run(limit = batch_size):
    results_fetched += 1

    # Do something

    if results_fetched == batch_size:
      start_cursor = query.cursor()
      break
  else:
    break
phatmann
  • 18,161
  • 7
  • 61
  • 51
  • 1
    It's not exactly accurate to say "a query cannot live more than 30 seconds" - see discussion here, especially comment #8 on-wards: https://code.google.com/p/googleappengine/issues/detail?id=12243 – tom Oct 01 '15 at 18:46
  • @tom: So, a batch can run for 30 seconds, but a query can run for about 4 minutes? If so, how do you suggest I edit my answer? – phatmann Oct 06 '15 at 03:35
  • assuming @Patrick Costello is correct (he does work at Google :), I'd suggest you make 2 changes: 1) "The exact rules for DB query timeouts are complicated, but roughly: a query cannot live more than ~2.5 mins, and a batch cannot live more than 30 seconds". Here is some code that breaks the job into multiple queries, using cursors, to avoid those timeouts. change #2) `for resource in query.run(limit = batch_size):` – tom Oct 06 '15 at 18:00
  • @tom, can you explain why I need to add `.run(limit = batch_size)` on the query? – phatmann Oct 09 '15 at 14:04
  • Good question! @Patrick Costello included it in his suggested code, so I'm *assuming* it's correct, but you're right to check! As to why: my *assumption* is that limiting the query to the size of 'our batch' prevents AE from pre-fetching results/batches beyond our limit. My understanding is that GAE automatically fetches the next batch as it process the current one, but I dont know all the details. Our `break` is not anticipatable by GAE, so I think it stands to reason that GAE cant optimize its DB RPCs w/o the hint that `limit` provides. Maybe someone who isnt guessing can weigh in? – tom Oct 09 '15 at 18:02
  • @tom: thanks for the explanation. I read the docs for the Query class and it says that `run` "returns an iterable for looping over the results of the query". It then goes on to say "If you don't need to change the default argument values, you can just use the query object directly as an iterable to control the loop. This implicitly calls run() with default arguments." Since we are changing the default arguments, we need to call it. Making the edit now. – phatmann Oct 12 '15 at 15:46
  • great! now we need to get this answer as the 'accepted' one. it's certainly more useful that the others. – tom Oct 12 '15 at 20:38
1

Below is the code I use to solve this problem, by breaking up a single large query into multiple small ones. I use the google.appengine.ext.ndb library -- I don't know if that is required for the code below to work.

(If you are not using ndb, consider switching to it. It is an improved version of the db library and migrating to it is easy. For more information, see https://developers.google.com/appengine/docs/python/ndb.)

from google.appengine.datastore.datastore_query import Cursor

def ProcessAll():
  curs = Cursor()
  while True:
    records, curs, more = MyEntity.query().fetch_page(5000, start_cursor=curs)
    for record in records:
      # Run your custom business logic on record.
      RunMyBusinessLogic(record)
    if more and curs:
      # There are more records; do nothing here so we enter the 
      # loop again above and run the query one more time.
      pass
    else:
      # No more records to fetch; break out of the loop and finish.
      break
Martin Omander
  • 3,223
  • 28
  • 23