We are trying to setup a new host for Jenkins Docker agent with rootless setup. We already have a CI/CD pipe with the same scheme, except its not running rootless and due to security requierments, we need to transition to a rootless docker setup. The issue is that on the new host, when jenkins docker agent is trying to start, it dies shortly after with an error message from curl that it could not write the slave.jar file to the /home/jenkins dir.
The expectation here is that the jenkins docker agent spins up a container on the host running rootless docker deamon under the jenkins user, and in that container spins up another container that will contain the buildjob itself with the tools and files from the project. This is how it happens today, except that the second container is still ran on the host side-by-side and not inside the agent container + its running rootfull with -v /var/run/docker.sock:/var/run/docker.sock
on the host and --group-add 998
(docker group on host) passed to the containers.
This is what we have done:
Jenkins Master running on a VM with YetAnotherDockerPlugin (YADP) for provisioning cloud nodes. Jenkins slave host where the nodes will be provisioned to running Ubuntu Jammy with dockerd-rootless.sh running under jenkins user as a service as per:
https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository https://docs.docker.com/engine/security/rootless/ https://unix.stackexchange.com/questions/587674/systemd-not-detected-dockerd-daemon-needs-to-be-started-manually
The ports, ping, expose API over TCP and set XDG_RUNTIME_DIR to /run/user/1000(jenkins user UID) parts of the guide are done. We skipped the "running rootless inside a rootfull" part since we are trying to avoid running anything as root, except inside the unpriviledged rootless container, so that there is a separation between the host and the container, and that tools inside the container(maven, yarn, sonar etc, depending on the project being built) have everything they need to build and test the app, without having access to anything on the host.
The YADP is setup to spinup a modified version of jenkins/inbound-agent that we use the following Dockerfile for:
FROM jenkins/inbound-agent:latest
ARG docker_version=20.10.21
ENV docker_version="${docker_version}"
USER root
RUN apt-get update -qq && apt-get install -qqy \
apt-transport-https \
ca-certificates \
curl \
&& rm -rf /var/lib/apt/lists/* \
&& update-ca-certificates -f \
&& curl -vOL "https://download.docker.com/linux/static/stable/x86_64/docker-${docker_version}.tgz" \
&& tar zxvf "docker-${docker_version}.tgz" \
&& chmod +x docker/docker \
&& mv docker/docker /usr/bin/ \
&& rm -rf docker*
USER jenkins
We then push the image to our local harbor and fetch it from there with the YADP to provision new cloud nodes. This setup works as long as we dont care about rootless.
YADP is setup to run as jenkins
, with /home/jenkins
as remote filing system root, over the JNLP protocol with remoting.
Connection test from YADP returns this:
com.github.kostyasha.yad_docker_java.com.github.dockerjava.api.model.Version@69e13e37[
apiVersion=1.42
arch=amd64
gitCommit=bc3805a
goVersion=go1.19.5
kernelVersion=5.15.0-60-generic
operatingSystem=linux
version=23.0.1
buildTime=2023-02-09T19:47:01.000000000+00:00
experimental=<null>
minAPIVersion=1.12
platform=VersionPlatform(name=Docker Engine - Community)
components=[VersionComponent(details={ApiVersion=1.42, Arch=amd64, BuildTime=2023-02-09T19:47:01.000000000+00:00, Experimental=false, GitCommit=bc3805a, GoVersion=go1.19.5, KernelVersion=5.15.0-60-generic, MinAPIVersion=1.12, Os=linux}, name=Engine, version=23.0.1), VersionComponent(details={GitCommit=31aa4358a36870b21a992d3ad2bef29e1d693bec}, name=containerd, version=1.6.16), VersionComponent(details={GitCommit=v1.1.4-0-g5fd4c4d}, name=runc, version=1.1.4), VersionComponent(details={GitCommit=de40ad0}, name=docker-init, version=0.19.0), VersionComponent(details={ApiVersion=1.1.1, NetworkDriver=slirp4netns, PortDriver=builtin, StateDir=/tmp/rootlesskit1966947544}, name=rootlesskit, version=1.1.0), VersionComponent(details={GitCommit=6a7b16babc95b6a3056b33fb45b74a6f62262dd4}, name=slirp4netns, version=1.0.1)]
]
When testing the setup directly from the host with docker run hello-world
or any other container: it works and spins it up. However, when trying to provision a job from the master to the cloud node we see the following behaviour(console output from jenkins job):
Started by user
Replayed #47
Obtained Jenkinsfile from b4678923ee56401f632f88b52d62980c9856d98c
Loading library utils@dockermocker
Attempting to resolve dockermocker from remote references...
> git --version # timeout=10
> git --version # 'git version 2.21.0'
using GIT_ASKPASS to set credentials
> git ls-remote -h -- ssh://git@stash.REDACTEX.XYZ:7999/k8s/jenkins-utils.git # timeout=10
Could not find dockermocker in remote references. Pulling heads to local for deep search...
> git rev-parse --resolve-git-dir /var/lib/jenkins/caches/git-223646e579ae9344bc0d6094cefb3bdc/.git # timeout=10
Setting origin to ssh://git@stash.example.com:7999/k8s/jenkins-utils.git
> git config remote.origin.url ssh://git@stash.example.com:7999/k8s/jenkins-utils.git # timeout=10
Fetching origin...
Fetching upstream changes from origin
> git --version # timeout=10
> git --version # 'git version 2.21.0'
> git config --get remote.origin.url # timeout=10
using GIT_ASKPASS to set credentials
> git fetch --tags --force --progress -- origin +refs/heads/*:refs/remotes/origin/* # timeout=10
> git rev-parse dockermocker^{commit} # timeout=10
> git branch -a -v --no-abbrev --contains bd8e66827842f81fabb6687eb84a105ec22ddeda # timeout=10
Selected match: fix/sonar_bioreg revision bd8e66827842f81fabb6687eb84a105ec22ddeda
Selected Git installation does not exist. Using Default
The recommended git tool is: NONE
using credential 6ae3d748-ca5f-4025-89ed-a0197566a767
> git rev-parse --resolve-git-dir /var/lib/jenkins/jobs/marine_resources/jobs/exemption-system-test/branches/frcrecompile/workspace@libs/e24118966322b2b1294773b08a640f11f78ab73117b82eef54f0ddc54c95568b/.git # timeout=10
Fetching changes from the remote Git repository
> git config remote.origin.url ssh://git@stash.example.com:7999/k8s/jenkins-utils.git # timeout=10
Fetching without tags
Fetching upstream changes from ssh://git@stash.example.com:7999/k8s/jenkins-utils.git
> git --version # timeout=10
> git --version # 'git version 2.21.0'
using GIT_ASKPASS to set credentials
> git fetch --no-tags --force --progress -- ssh://git@stash.example.com:7999/k8s/jenkins-utils.git +refs/heads/*:refs/remotes/origin/* # timeout=10
Checking out Revision bd8e66827842f81fabb6687eb84a105ec22ddeda (fix/sonar_bioreg)
> git config core.sparsecheckout # timeout=10
> git checkout -f bd8e66827842f81fabb6687eb84a105ec22ddeda # timeout=10
Commit message: "glemte -e i testfilen"
[Bitbucket] Notifying commit build result
[Pipeline] Start of Pipeline
[Pipeline] node
Still waiting to schedule task
‘skolest-1d58cf286c95’ is offline
Aborted by
[Pipeline] // node
[Pipeline] End of Pipeline
[Bitbucket] Notifying commit build result
[Bitbucket] Build result notified
Finished: ABORTED
Output from journalctl --user -xeu docker.service on the host:
Feb 10 14:02:22 skolest dockerd-rootless.sh[1717]: time="2023-02-10T14:02:22.651393026+01:00" level=info msg="loading plugin \"io.containerd.event.v1.publisher\"..." runtime=io.containerd.runc.v2 type=io.containerd.event.v1
Feb 10 14:02:22 skolest dockerd-rootless.sh[1717]: time="2023-02-10T14:02:22.652601430+01:00" level=info msg="loading plugin \"io.containerd.internal.v1.shutdown\"..." runtime=io.containerd.runc.v2 type=io.containerd.internal.v1
Feb 10 14:02:22 skolest dockerd-rootless.sh[1717]: time="2023-02-10T14:02:22.652666939+01:00" level=info msg="loading plugin \"io.containerd.ttrpc.v1.task\"..." runtime=io.containerd.runc.v2 type=io.containerd.ttrpc.v1
Feb 10 14:02:22 skolest dockerd-rootless.sh[1717]: time="2023-02-10T14:02:22.653313374+01:00" level=info msg="starting signal loop" namespace=moby path=/run/.ro1860578725/user/1000/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/217715836cffdddd2d9d0d83178ec29f22e9789f2f484c675>
Feb 10 14:02:23 skolest 217715836cff[1682]: + cat
Feb 10 14:02:23 skolest 217715836cff[1682]: + chmod +x /tmp/init.sh
Feb 10 14:02:23 skolest 217715836cff[1682]: + exec /tmp/init.sh
Feb 10 14:02:23 skolest 217715836cff[1682]: + export CONFIG=/tmp/config.sh
Feb 10 14:02:23 skolest 217715836cff[1682]: + [ ! -f /tmp/config.sh ]
Feb 10 14:02:23 skolest 217715836cff[1682]: + echo No config, sleeping for 1 second
Feb 10 14:02:23 skolest 217715836cff[1682]: No config, sleeping for 1 second
Feb 10 14:02:23 skolest 217715836cff[1682]: + sleep 1
Feb 10 14:02:24 skolest 217715836cff[1682]: + [ ! -f /tmp/config.sh ]
Feb 10 14:02:33 skolest 217715836cff[1682]: + echo Found config file
Feb 10 14:02:33 skolest 217715836cff[1682]: Found config file
Feb 10 14:02:33 skolest 217715836cff[1682]: + . /tmp/config.sh
Feb 10 14:02:33 skolest 217715836cff[1682]: + JENKINS_URL=https://jenkins.example.com/
Feb 10 14:02:33 skolest 217715836cff[1682]: + JENKINS_USER=root
Feb 10 14:02:33 skolest 217715836cff[1682]: + JENKINS_HOME=/home/jenkins
Feb 10 14:02:33 skolest 217715836cff[1682]: + COMPUTER_URL=computer/skolest%2D217715836cff/
Feb 10 14:02:33 skolest 217715836cff[1682]: + COMPUTER_SECRET=e94ad9e249973aa7f77cc1ebf1f167976d079bafd807d25dca409556c8378819
Feb 10 14:02:33 skolest 217715836cff[1682]: + JAVA_OPTS=-Djavax.net.ssl.trustStore=/etc/ssl/certs/java/cacerts
Feb 10 14:02:33 skolest 217715836cff[1682]: + SLAVE_OPTS=
Feb 10 14:02:33 skolest 217715836cff[1682]: + NO_CERTIFICATE_CHECK=false
Feb 10 14:02:33 skolest 217715836cff[1682]: + NO_RECONNECT_SLAVE=true
Feb 10 14:02:33 skolest 217715836cff[1682]: + [ -z https://jenkins.example.com/ ]
Feb 10 14:02:33 skolest 217715836cff[1682]: + [ -z computer/skolest%2D217715836cff/ ]
Feb 10 14:02:33 skolest 217715836cff[1682]: + [ -z /home/jenkins ]
Feb 10 14:02:33 skolest 217715836cff[1682]: + id -u jenkins
Feb 10 14:02:33 skolest 217715836cff[1682]: 1000
Feb 10 14:02:33 skolest 217715836cff[1682]: + [ ! -d /home/jenkins ]
Feb 10 14:02:33 skolest 217715836cff[1682]: + stat /home/jenkins
Feb 10 14:02:33 skolest 217715836cff[1682]: File: /home/jenkins
Feb 10 14:02:33 skolest 217715836cff[1682]: Size: 4096 Blocks: 8 IO Block: 4096 directory
Feb 10 14:02:33 skolest 217715836cff[1682]: Device: 30h/48d Inode: 802977 Links: 1
Feb 10 14:02:33 skolest 217715836cff[1682]: Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Feb 10 14:02:33 skolest 217715836cff[1682]: Access: 2023-02-10 09:27:49.538874022 +0000
Feb 10 14:02:33 skolest 217715836cff[1682]: Modify: 2023-02-10 13:02:22.740563969 +0000
Feb 10 14:02:33 skolest 217715836cff[1682]: Change: 2023-02-10 13:02:22.740563969 +0000
Feb 10 14:02:33 skolest 217715836cff[1682]: Birth: 2023-02-10 13:02:22.740563969 +0000
Feb 10 14:02:33 skolest 217715836cff[1682]: + cd /home/jenkins
Feb 10 14:02:33 skolest 217715836cff[1682]: + [ false = true ]
Feb 10 14:02:33 skolest 217715836cff[1682]: + WGET_OPTIONS=
Feb 10 14:02:33 skolest 217715836cff[1682]: + CURL_OPTIONS=
Feb 10 14:02:33 skolest 217715836cff[1682]: + NO_SLAVE_CERT=
Feb 10 14:02:33 skolest 217715836cff[1682]: + command -v wget
Feb 10 14:02:33 skolest 217715836cff[1682]: + [ -x ]
Feb 10 14:02:33 skolest 217715836cff[1682]: + command -v curl
Feb 10 14:02:33 skolest 217715836cff[1682]: + [ -x /usr/bin/curl ]
Feb 10 14:02:33 skolest 217715836cff[1682]: + curl --remote-name https://jenkins.example.com//jnlpJars/slave.jar
Feb 10 14:02:33 skolest 217715836cff[1682]: % Total % Received % Xferd Average Speed Time Time Time Current
Feb 10 14:02:33 skolest 217715836cff[1682]: Dload Upload Total Spent Left Speed
Feb 10 14:02:33 skolest 217715836cff[1682]: [142B blob data]
Feb 10 14:02:33 skolest 217715836cff[1682]: [79B blob data]
Feb 10 14:02:33 skolest 217715836cff[1682]: curl: (23) Failure writing output to destination
Feb 10 14:02:33 skolest dockerd-rootless.sh[1682]: time="2023-02-10T14:02:33.199921899+01:00" level=info msg="ignoring event" container=217715836cffdddd2d9d0d83178ec29f22e9789f2f484c67583b3884146b8794 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
Feb 10 14:02:33 skolest dockerd-rootless.sh[1717]: time="2023-02-10T14:02:33.200024658+01:00" level=info msg="shim disconnected" id=217715836cffdddd2d9d0d83178ec29f22e9789f2f484c67583b3884146b8794
Feb 10 14:02:33 skolest dockerd-rootless.sh[1717]: time="2023-02-10T14:02:33.200112878+01:00" level=warning msg="cleaning up after shim disconnected" id=217715836cffdddd2d9d0d83178ec29f22e9789f2f484c67583b3884146b8794 namespace=moby
Feb 10 14:02:33 skolest dockerd-rootless.sh[1717]: time="2023-02-10T14:02:33.200145579+01:00" level=info msg="cleaning up dead shim"
Feb 10 14:02:33 skolest dockerd-rootless.sh[1717]: time="2023-02-10T14:02:33.224443531+01:00" level=warning msg="cleanup warnings time=\"2023-02-10T14:02:33+01:00\" level=info msg=\"starting signal loop\" namespace=moby pid=5848 runtime=io.containerd.runc.v2\n"
lines 933-1000/1000 (END)
From the journalctl output we see that the /home/jenkins
directory inside the container its trying to spin up is owned by (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
and is run by user jenkins
with uid 1000
. The curl command failes when trying to write the slave.jar and the container dies.