I have two windows machines, one with Jenkins master and one Jenkins slave. On both machines Jenkins is installed as a service and the slave is configured to be taken offline after 300 minutes of inactivity. Software tests should be executed on both machines during the night. Often when I check in the morning I find the following situation:
- Jenkins master is up and running, all tests were executed on this machine.
- Several jobs are in starvation mode because the slave is offline.
- Jenkins slave windows service is stopped.
- Restarting the master and starting a job on the slave node does not bring the slave online.
No useful error information can be found on the slave. The last lines in jenkins-slave.err.log are:
INFO: Connected
Apr 01, 2019 3:40:23 PM hudson.remoting.jnlp.Main$CuiListener status
INFO: Terminated
Apr 01, 2019 3:40:33 PM jenkins.slaves.restarter.JnlpSlaveRestarterInstaller$FindEffectiveRestarters$1 onReconnect
INFO: Restarting agent via jenkins.slaves.restarter.WinswSlaveRestarter@99751ad
The master just prints a lot of lines like
Apr 02, 2019 9:08:23 AM hudson.slaves.RetentionStrategy$Demand check
INFO: Disconnecting computer XYZ as it has been idle for 23 hr
The slave.log on the master does not help either:
Remoting version: 3.27
This is a Windows agent
Agent successfully connected and online
ERROR: Connection terminated
java.nio.channels.ClosedChannelException
I found an event in the windows event viewer saying:
The Jenkins agent (jenkinsslave-C__Program Files (x86)_Jenkins-Slave) service failed to start due to the following error:
The service did not respond to the start or control request in a timely fashion.
I added to master and slave execution command lines :
-Dhudson.lifecycle=hudson.lifecycle.WindowsServiceLifecycle
Once I manually start the windows service on the slave machine, it comes back online and jobs continue.
I often get the impression that this has something to do with windows updates being installed automatically on the master. But if that is the problem, how could I make the slave connect?
I am thankful for any ideas why this is happening or how I can investigate this issue further.