Since upgrading Windows Domino server from 8.5.3 to 9 an important agent has not started after runtime error twice in two weeks. We have 3 agent manager instances running at day time (07:00 - 16:30) and 5 at night time. There are quite many databases with agents running every 5 minutes but most of these agents run in less than 1 sec.
This important scheduled (every 5 minutes) LotusScript agent runs for 24 hours, hits the time limit and starts again (it spends most of the time in sleep). Sometimes it stops in a runtime error. The last time was yesterday when it stopped at 11:30 and it did not start until today at 14:20 when I disabled and enabled the agent.
Before this enable/disable I checked that all 3 instances of Agent Manager were idle but for some reason they were not picking up this agent any more. Here is Amgr status before the disable/enable:
> Tell Amgr Status
29.07.2013 14:17:38 AMgr: Status report at '29.07.2013 14:17:38'
29.07.2013 14:17:38 Agent Manager has been running since '29.07.2013 14:05:27'
29.07.2013 14:17:38 There are currently '3' Agent Executives running
29.07.2013 14:17:38 There are currently '520' agents in the Scheduled Task Queue
29.07.2013 14:17:38 There are currently '100' agents in the Eligible Queue
29.07.2013 14:17:38 There are currently '1' databases containing agents triggered by new mail
29.07.2013 14:17:38 There are currently '1' agents in the New Mail Event Queue
29.07.2013 14:17:38 There are currently '0' databases containing agents triggered by document updates
29.07.2013 14:17:38 There are currently '0' agents in the Document Update Event Queue
29.07.2013 14:17:38 AMgr: Current control parameters in effect:
29.07.2013 14:17:38 AMgr: Daily agent cache refresh is performed at '04:15:00'
29.07.2013 14:17:38 AMgr: Currently in Daytime period
29.07.2013 14:17:38 AMgr: The maximum number of concurrently executing agents is '3'
29.07.2013 14:17:38 AMgr: The maximum number of minutes a LotusScript/Java agent is allowed to run is '1440'
29.07.2013 14:17:38 AMgr: Executive '1', total agent runs: 322855
29.07.2013 14:17:38 AMgr: Executive '1', total elapsed run time: 28064
29.07.2013 14:17:38 AMgr: Executive '2', total agent runs: 102967
29.07.2013 14:17:38 AMgr: Executive '2', total elapsed run time: 364127
29.07.2013 14:17:38 AMgr: Executive '3', total agent runs: 297064
29.07.2013 14:17:38 AMgr: Executive '3', total elapsed run time: 78582
There seems to be a maximum of 100 for eligible agents because I always get 100. Is that the problem and how to increase the maximum?
If the agent manager is too busy at day time (which did not seem to be the case because all 3 were idling when I looked) I would expect it to start the agent at least at night time when there are 5 instances.
Any ideas how to fix the problem or should I just add all kinds of AMgr debug parameters to notes.ini to get more info when this happens next time.
After this last occurrence I disabled agents in some old databases and increased AMgr instances by 1.
I also tested the runtime error in a different db with a simple test agent but that started again after error.