0

I have a continuous WebJob which executes a function which is queue triggered and it occasionally throws System.Threading.Tasks.TaskCanceledException.

WebJobs SDK version : 2.1.0

StackTrace:

Microsoft.Azure.WebJobs.Host.FunctionInvocationException: 
Microsoft.Azure.WebJobs.Host.FunctionInvocationException : Exception while executing function: Functions.ProcessCreationQueueMessage ---> 
System.AggregateException : One or more errors occurred. ---> A task was canceled. at 
System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions) at
 System.Threading.Tasks.Task`1.GetResultCore(Boolean waitCompletionNotification) at System.Threading.Tasks.Task`1.get_Result() at
 Normandy.Portal.Reporting.Creation.WebJobs.Functions.ProcessCreationQueueMessage(String message,String& outputQueueMessage,TextWriter log) at
 D:\a\1\s\Portal\src\Normandy.Portal.Reporting.Creation.WebJobs\Functions.cs : 105 at lambda_method(Closure ,Functions ,Object[] ) at Microsoft.Azure.WebJobs.Host.Executors.VoidMethodInvoker`2.InvokeAsync(TReflected instance,Object[] arguments) at
async Microsoft.Azure.WebJobs.Host.Executors.FunctionInvoker`2.InvokeAsync[TReflected,TReturnValue](Object instance,Object[] arguments) at 
async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.InvokeAsync(IFunctionInvoker invoker,ParameterHelper parameterHelper,CancellationTokenSource timeoutTokenSource,CancellationTokenSource functionCancellationTokenSource,Boolean throwOnTimeout,TimeSpan timerInterval,IFunctionInstance instance) at 
async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithWatchersAsync(IFunctionInstance instance,ParameterHelper parameterHelper,TraceWriter traceWriter,CancellationTokenSource functionCancellationTokenSource) 
at async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(??) 
at async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(??) ---> (Inner Exception #0) System.Threading.Tasks.TaskCanceledException : A task was canceled.<--- End of inner exception 
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.ExecuteWithLoggingAsync(??) at 
async Microsoft.Azure.WebJobs.Host.Executors.FunctionExecutor.TryExecuteAsync(IFunctionInstance functionInstance,CancellationToken cancellationToken) 

WebJob:

        public void ProcessCreationQueueMessage([QueueTrigger("reportcreation")] string message, [Queue("reportretrieval")] out string outputQueueMessage, TextWriter log)
        {
            JsonSerializerSettings settings = new JsonSerializerSettings { TypeNameHandling = TypeNameHandling.All };

            log.WriteLine(message);

        ...

            try
            {
                var jobId = _reportingService.RequestReport(req.ReportName, searchString).Result;

                var output = new RetrieveReportDataRequest()
                {
                    CreationRequest = new ReportCreationRequest()
                    {
                        UserId = req.UserId,
                        UID = req.UID,
                        CreateDate = req.CreateDate,
                        IsMultiSchoolUser = req.IsMultiSchoolUser,
                        CentreCode = req.CentreCode,
                        ReportName = req.ReportName,
                        InputSearchParameters = req.InputSearchParameters,
                    },
                    SplunkSearchParameters = searchString,
                    SplunkJobId = jobId,
                    Created = DateTime.UtcNow,
                };

                outputQueueMessage = JsonConvert.SerializeObject(output, settings);
            }
            catch (Exception ex)
            {
                log.WriteLine(ex.Message);
                throw;
            }
        }

There is no any time taking tasks apparently. This is failing once for certain messages. What am I missing?

PNDev
  • 590
  • 1
  • 8
  • 23
  • What is `RequestReport` doing? You may be calling methods somewhere that do async work under the hood and are throwing `TaskCanceledException` due to timeouts or something similar. For example, `HttpClient` is a common source of unexpected `TaskCanceledException`s, which are thrown on timeouts. – nlawalker May 01 '19 at 04:42
  • RequestReport is calling an external API asynchronously. Does the WebJob throw TaskCancelledException when any API method throws timeout or does the external api specifically has to throw TaskCancelledException? And is there any default timeout for asynchronous calls? – PNDev May 01 '19 at 05:04
  • If *_reportingService*-service is hosted on AppService and its execution takes much time (more than 230 sec) then it can be reason of fail - see [Why does my request time out after 230 seconds?](https://learn.microsoft.com/en-us/azure/app-service/faq-availability-performance-application-issues#why-does-my-request-time-out-after-230-seconds). Could you look at [Azure WebJob dashboard](https://learn.microsoft.com/en-us/azure/app-service/webjobs-create#ViewJobHistory) to check the execution time of failed calls of the webjob? – vladimir May 01 '19 at 20:04
  • Just curious, it looks like this webjob is activated by triggers, why is it [Continuous](https://learn.microsoft.com/en-us/azure/app-service/webjobs-create#webjob-types) not *Triggered*? – vladimir May 01 '19 at 20:07
  • @vladimir This webjob contains more than one trigger, Although I was not involved in this design, I idea was to make it continuous and keep all the related triggers in one WebJob. And the _reportingService is a Azure Service Fabric Service and definitely the webjob is not taking 230 seconds. From what I could see in the WebJobs Dashboard, all the failed runs has execution time between 40 seconds to 2 minutes. And successful runs have less than 2 seconds. I'm almost convinced that it's happening because of timeout but do you know any way to set this timeout? – PNDev May 02 '19 at 01:48
  • @PNDev I don't see where is the reason this issue ;(. There are some workarounds: use [RetryPolicy](https://github.com/App-vNext/Polly#retry) for the service call or implement smth like [REST Polling pattern](https://stackoverflow.com/a/55524185/303298). [please think about these points: is it ok that ReportingService processes request in 1-2 min? Can this time increase in the future?] – vladimir May 02 '19 at 04:11
  • suggestion: if ReportingService is using HttpClient try to increase [HttpClient.Timeout](https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclient.timeout?view=netcore-2.2) – vladimir May 02 '19 at 04:35

1 Answers1

0

It can be related to host-settings - look at Azure Portal -> AppService configuration, make sure that Always On is set to On:

AppService config

vladimir
  • 13,428
  • 2
  • 44
  • 70