I am trying to figure out one problem, but I am struggling to find a viable solution. The issue is probably more about theory then about implementation. I simply need some other points of view.
The problem is:
We are using Navision Application Servers (NAS) which run some sort of business logic, replications, XML handling and stuff via Reports and Codeunits. At times some of these jobs get stuck in a loop or on deadlock etc.
The ideal solution would be to fix issues in the Codeunits and Reports so they can handle their own problems; But this is not an option. I don't really have access to the code of these jobs.
I am trying to find a way how at least partially automate detection of these problems. Only way I can think of is to store some resource consumption statistics (CPU, SQL CPU and I/O, perhaps an idle time) for each job and compare it during the next run. If there will be some major differences it would trigger an alarm.
If job which takes 4 hours to complete, get stuck at start of the process, I would like to know that in reasonable time, not after 6 hours when is it obvious.
I have full access to SQL server, NAS server and it's process. I am using C# with .NET 4
Thank you.