Jenkins version: 1.643.2
Docker Plugin version: 0.16.0
In my Jenkins environment, I have a Jenkins master with 2-5 slave node servers (slave1, slave2, slave3).
Each of these slaves are configured in Jenkins Global configuration using Docker Plugin.
Everything is working at this minute.
I saw our monitoring system throwing some alerts for high SWAP space usage on slave3 (for ex IP: 11.22.33.44) so I ssh'ed to that machine and ran: sudo docker ps
which gave me the valid output for the currently running docker containers on this slave3 machine.
By running ps -eo pmem,pcpu,vsize,pid,cmd | sort -k 1 -nr | head -10
on the target slave's machine (where 4 containers were running), I found the top 5 processes eating all the RAM was java -jar slave.jar
running inside each container. So I thought why not restart the shit and recoup some memory back. In the following output, I see what was the state of sudo docker ps
command before and after the docker restart
<container_instance>
step. SCROLL right, you'll notice that in the 2nd line for container ID ending with ...0a02
, the virtual port (listed under heading NAMES) on the host (slave3) machine was 1053 (which was mapped to container's virtual IP's port 22 for SSH). Cool, what this means is, when from Jenkins Manage Node
section, if you try to Relaunch a slave's container, Jenkins will try to connect to the HOST IP's 11.22.33.44:1053 and do whatever it's supposed to successfully bring the slave up. So, Jenkins is holding that PORT (1053) somewhere.
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ae3eb02a278d docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 26 hours ago Up 26 hours 0.0.0.0:1048->22/tcp lonely_lalande
d4745b720a02 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up About an hour 0.0.0.0:1053->22/tcp cocky_yonath
bd9e451265a6 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up About an hour 0.0.0.0:1050->22/tcp stoic_bell
0e905a6c3851 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up About an hour 0.0.0.0:1051->22/tcp serene_tesla
sudo docker restart d4745b720a02; echo $?
d4745b720a02
0
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ae3eb02a278d docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 26 hours ago Up 26 hours 0.0.0.0:1048->22/tcp lonely_lalande
d4745b720a02 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up 4 seconds 0.0.0.0:1054->22/tcp cocky_yonath
bd9e451265a6 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up About an hour 0.0.0.0:1050->22/tcp stoic_bell
0e905a6c3851 docker.someinstance.coolcompany.com:443/jenkins-slave-stable-image:1.1 "bash -c '/usr/sbin/s" 9 days ago Up About an hour 0.0.0.0:1051->22/tcp serene_tesla
After running the sudo docker restart <instanceIDofContainer>
I ran free -h
/ grep -i swap /proc/meminfo
and found RAM (which was earlier fully used and was showing only remaining 230MB free) is now 1GB free and SWAP size which was 1G total, 1G used (I tried both swappiness 60 or 10), is now 450MB swap space free. So the alert thing got resolved. Cool.
BUT, now as you notice from the sudo docker ps
output above, after the restart step, for that container ID ...0a02
, I now got a new PORT# 1054!!
When I went to Manage Nodes > Tried to bring this node offline, stopped it, and relaunched it, Jenkins is NOT picking up the NEW PORT (1054). It's still somehow picking the old port 1053 (while trying to make a SSH connection to 11.22.33.44 (Host's IP) on port 1053 (which is mapped to container's Virtual IP's port # 22 (ssh)).
How can I change this port or configuration in Jenkins for this slave container so that Jenkins will see the new PORT and can successfully relaunch?
PS: Clicking "Configure" on the Node to see it's configuration is NOT showing me anything other than just Name field. Usually there's a lot of fields in a regular slave (where you can define the labels, root dir, launch method, properties env variables, tools for the slave environment but I guess for these Docker containers, I'm not seeing anything other than just Name field). Clicking Test Connection in Jenkins Global configuration (under Docker Plugin's section) shows it's successfully finding Docker version 1.8.3
Right now, as 1053 port (telnet) is not working as it's now 1054 for this container's instanceID (after the restart step), Jenkins relaunch step is failing during SSH connection step (first thing it does to connect via SSH method).
[07/27/17 17:17:19] [SSH] Opening SSH connection to 11.22.33.44:1053.
Connection timed out
ERROR: Unexpected error in launching a slave. This is probably a bug in Jenkins.
java.lang.IllegalStateException: Connection is not established!
at com.trilead.ssh2.Connection.getRemainingAuthMethods(Connection.java:1030)
at com.cloudbees.jenkins.plugins.sshcredentials.impl.TrileadSSHPasswordAuthenticator.canAuthenticate(TrileadSSHPasswordAuthenticator.java:82)
at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:207)
at com.cloudbees.jenkins.plugins.sshcredentials.SSHAuthenticator.newInstance(SSHAuthenticator.java:169)
at hudson.plugins.sshslaves.SSHLauncher.openConnection(SSHLauncher.java:1212)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:711)
at hudson.plugins.sshslaves.SSHLauncher$2.call(SSHLauncher.java:706)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[07/27/17 17:19:26] Launch failed - cleaning up connection
[07/27/17 17:19:26] [SSH] Connection closed.