App Engine generating infinite retries

Question

I have a backends that is normally invoked by a cron to run a few times every day. Yesterday, I noticed it was restarting without stopping. I dont see a place in my code where that invocation is happening. Rather, the task queue seems to indicate it is running due to re-tries due to errors. One error is that status is saved to bigQuery and that is failing because a quoto is exceeded. But this seems to generate an infinite loop. Is this a bug in app engine or I am doing something wrong? Is there a way to indicate to not restart a task if it fails? My other app engine tasks that terminate without 200 status dont do that...

Here is a trace of the queue from which the restarts keep happening:

Here is the logging showing continous running

And here is the http header inside the logging

UPDATE1 Here is the cron:

<?xml version="1.0" encoding="UTF-8"?>
<cronentries>
<cron>
    <url>/uploadToBigQueryStatus</url>
    <description>Check fileNameSaved Status</description>
    <schedule>every 15 minutes from 02:30 to 03:30</schedule>
    <timezone>US/Pacific</timezone>
    <target>checkuploadstatus-backend</target>
</cron>
</cronentries>

UPDATE 2 As for the comment about catching the error: The error I believe is that the biqQuery job fails because a quota has been hit. Strange thing is that it happened yesterday, and the quota should have been reset, so the error should have good away for at least a while. I dont understand why the task retries, I never selected that option that I am aware of.

I killed the servlet and emptied the task queue so at least it is stopped. But I dont know the root cause. IF BQ table quota was the reason, that shouldnt cause an infinite retry!

UPDATE 3 I have not trapped the servlet call that produced the error that led to the infinite retry. But I checked this cron activated servlet today and found I had another non-200 result. The return value this time was 500 and it is caused by a DataStore time-out exception.

Here is the screen shot of the return that show 500 return code.

Here is the exception info page 1

And the following data

The offending code line is the for loop iterating on the data store query

        if (keys[0] != null) {

            /* Define the query */
            q = new Query(bucket).setAncestor(keys[0]);

            pq = datastore.prepare(q);
            gotResult = false;

            // First system time stamp
            Date date= new Timestamp(new Date().getTime());
            Timestamp timeStampNow = new Timestamp(date.getTime());

            for (Entity result : pq.asIterable()) {

I will add a try-catch on this for loop as it is crashing in this iteration.

        if (keys[0] != null) {

            /* Define the query */
            q = new Query(bucket).setAncestor(keys[0]);

            pq = datastore.prepare(q);
            gotResult = false;

            // First system time stamp
            Date date= new Timestamp(new Date().getTime());
            Timestamp timeStampNow = new Timestamp(date.getTime());

            try {

                for (Entity result : pq.asIterable()) {

Hopefully, the data store read will not crash the servlet but it will render a failure. At leas the cron will run again and pickup other non-handled results. By the way, is this a java error or app engine? I see a lot of these data store time outs and I will add a try-catch around all the result loops. Still, it should not cause the infinite retry that I experienced. I will see if I can find the actual crash..problem is that it overloaded my logging...More later.

UPDATE 4 I went back to the logs to see when the inifinite loop began. In the logs below, I opened the run that is at the head of the continuous running. YOu can see that it fails with 500 every 5th time. It is not the cron that invoked it, it was me calling the servlet to check biq query upload status (I write to the data store the job info, then read it back in servlet and write to bigQuery the job status and if done, erase the data store entry.) I cannot explain the steady 500 errors every 5th call, but it is always the Data Store Timeout exception.

UPDATE 5 Can the infinite retries be happening because of the queue configuration? CheckUploadStatus 20/s 10 100 10 200 2 I just noticed another task queue had a 500 return code and it was continuously retrying. I did some search and found some people have tried to configure the queues for no retry. They said that didnt work.
See this link: Google App Engine: task_retry_limit doesn't work? But one re-try is possible? That is far better than infinite.

It is contradictory that Google enforces quotas but seems to prefer infinite retries. I would much prefer block the retries by default on non-200 return code and then have NO QUOTAS!!!

Please edit your question and include your cron configuration for that job. — Dan Cornilescu, Jul 07 '16 at 22:19
You will need to catch the error, and send a 200 response to stop the task queue from retrying. — GAEfan, Jul 07 '16 at 23:31

score 0 · Answer 1 · answered Jul 08 '16 at 01:22

According to Retrying cron jobs that fail:

If a cron job's request handler returns a status code that is not in the range 200–299 (inclusive) App Engine considers the job to have failed. By default, failed jobs are not retried.

To set failed jobs to be retried:

Include a retry-parameters block in your cron.xml file.

Choose and set the retry parameters in the retry-parameters block.

Your cron config doesn't specify the necessary retry parameters, so the jobs returning the 500 code should, indeed, not be retried, as you expect.

So this looks like a bug. Possibly a variant of the (older) known issue 10075 - the 503 code mentioned there might have changed in the mean time - but it is also a quota-related failure.

The suggestion from GAEfan's comment is likely a good workaround:

You will need to catch the error, and send a 200 response to stop the task queue from retrying. – GAEfan 1 hour ago

App Engine generating infinite retries

1 Answers1