6

I have a application that runs periodically (it's a scheduled task). The task is launched once a minute, and normally only takes a few seconds to do its business, then exits.

But there's a ~1 in 80,000 chance (every two or three months) that the application will hang. The root cause is because we're using Microsoft ServerXmlHttpRequest component to perform some work, and sometimes it just decides to hang. The virtue of ServerXmlHttpRequest over XmlHttpRequest is that the latter is not recommended for important scenarios, such as where reliability and security are important (which is true of an unattended server component):

The ServerXMLHTTP object offers functionality similar to that of the XMLHTTP object. Unlike XMLHTTP, however, the ServerXMLHTTP object does not rely on the WinInet control for HTTP access to remote XML documents. ServerXMLHTTP uses a new HTTP client stack. Designed for server applications, this server-safe subset of WinInet offers the following advantages:

  • Reliability — The HTTP client stack offers longer uptimes. WinInet features that are not critical for server applications, such as URL caching, auto-discovery of proxy servers, HTTP/1.1 chunking, offline support, and support for Gopher and FTP protocols are not included in the new HTTP subset.
  • Security — The HTTP client stack does not allow a user-specific state to be shared with another user's session. ServerXMLHTTP provides support for client certificates.

The job is being run as a scheduled task. I need the task to continue to run periodically; killing the existing process if it's dead.

The Windows Task Scheduler does have an option for forcibly close a task that is running too long:

enter image description here

The only downside to that approach is that it simply doesn't work - it simply does not stop the task. The hung process keeps running.

Given that i cannot trust the Microsoft ServerXmlHttpRequest to not arbitrarily lock up, and the task scheduler is unable to terminate the scheduled task, i need some way to do it myself.

Jobs

I tried looking into using the Job Objects API:

A job object allows groups of processes to be managed as a unit. Job objects are namable, securable, sharable objects that control attributes of the processes associated with them. A job can enforce limits such as working set size, process priority, and end-of-job time limit on each process that is associated with the job.

That one note sounded like exactly what i needed:

A job can enforce limits such as end-of-job time limit on each process that is associated with the job.

The only down-side to that approach is that it does not work. Job cannot impose a time-limit on a process. They can only impose a user time limit on a process:

PerProcessUserTimeLimit

If LimitFlags specifies JOB_OBJECT_LIMIT_PROCESS_TIME, this member is the per-process user-mode execution time limit, in 100-nanosecond ticks.

If the process is idle (for example, sitting at a MsgWaitForSingleObject as ServerXmlHttpRequest is), then it will accumulate no user time. I tested it. I created a job with a 1 second time limit, and placed my self process into it. As long as i don't move the mouse around my test application, it quite happily sits there for longer than one second.

Watchdog Thread

The only other technique i can imagine, given that my main thread is indefinitely blocked, is another thread. The only solution i can imagine is spawn another thread that will sleep for my three minutes, then ExitProcess:

Int32 watchdogTimeoutSeconds = FindCmdLineSwitch("watchdog", 0);
if (watchdogTimeoutSeconds > 0)
    Thread thread = new Thread(KillMeCallback, new IntPtr(watchdogTimeoutSeconds));

void KillMeCallback(IntPtr data)
{
   Int32 secondsUntilProcessIsExited = data.ToInt32();
   if (secondsUntilProcessIsExited <= 0) 
      return;

   Sleep(secondsUntilProcessIsExited*1000); //seconds --> milliseconds

   LogToEventLog(ExtractFilename(Application.ExeName), 
         "Watchdog fired after "+secondsUntilProcessIsExited.ToString()+" seconds. Process will be forcibly exited.", EVENTLOG_WARNING_TYPE, 999);

   ExitProcess(999);
}

And that works. The only downside is that it's a bad idea.

Can anyone think of anything better?

Edit

For now i will implement a

Contoso.exe /watchdog 180

So the process will be exited after 180 seconds. It means the duration is configurable, or can be removed completely easily in the field.

Community
  • 1
  • 1
Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
  • have you tried ServerXmlHttpRequest.setTimeouts? – Garr Godfrey May 05 '16 at 15:49
  • 1
    A simpler approach would be to create a mutex when the app starts, free it when it terminates. If your app runs the next time round and the mutex is set terminate the existing process? (The flat wininet API is extremely stable, I've had fast polling apps run for months with zero problems.) – Alex K. May 05 '16 at 15:52
  • @GarrGodfrey I did look into that. The ServerXmlHttpRequest has four timeouts: dns resolve, connect, send, wait response. By default those timeouts are ∞, 60, 30, and 30 seconds respectively. I shipped an update where i change the default timeout for DNS resolution to 30 seconds (why would they default it to infinite? Do they think after seven years, that year number eight will be lucky!?). I'll have to wait four or five months to hope that fixes it. But lets not confuse the example with the question. Perhaps i was using something else - a database call - using the Extensible Storage Engine. – Ian Boyd May 05 '16 at 15:58
  • can you change the code to work asynchronously? I see you are passing false to the open method. You could make call asynchronous and handle the wait yourself. – Garr Godfrey May 05 '16 at 16:01
  • well, in general ExitProcess is safer than killing the process from Task Manager. For a "better" solution you have to look at the specific case. – Garr Godfrey May 05 '16 at 16:14

1 Answers1

0

I used the route where i pass a special WatchDog argument to my process on the command line;

>Contoso.exe /watchdog 180

During initialization i check for the presence of the WatchDog option, with an integer number of seconds after it:

String s = Toolkit.FindCmdLineOption("watchdog", ["/", "-"]);
if (s <> "")
{
   Int32 seconds = StrToIntDef(s, 0);
   if (seconds > 0)
      RunInThread(WatchdogThreadProc, Pointer(seconds));
}

and my thread procedure:

void WatchdogProc(Pointer Data);
{
   Int32 secondsUntilProcessIsExited = Int32(Data);

   if (secondsUntilProcessIsExited <= 0)
      return;

   Sleep(secondsUntilProcessIsExited*1000); //seconds -> milliseconds

   LogToEventLog(ExtractFileName(ParamStr(0)), 
         Format("Watchdog fired after %d seconds. Process will be forcibly exited.", secondsUntilProcessIsExited), 
         EVENTLOG_WARNING_TYPE, 999);

   ExitProcess(2);
}
Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219