I am trying to setup a docker swarm.
I need my nodes to communicate via TLS.
I have created a cert for the manager node with extendedKeyUsage = serverAuth
I have configured the manager node with the following daemon.json:
{
"hosts": ["unix:///var/run/docker.sock", "tcp://0.0.0.0:2376"],
"tlscacert": "/var/docker/ca.pem",
"tlscert": "/var/docker/server-cert.pem",
"tlskey": "/var/docker/server-key.pem",
"tlsverify": true
}
To test this I have created a client cert used it t connect to the docker api from my laptop and I am able to connect sucessfully.
Now I need to add one worker node to the swarm.
I have set it up in the same way as the manager node; with a similar daemon.json. I have used an SSL key with extendedKeyUsage = serverAuth and proved client connection in the same way as on the manager node.
Then in the manager I have run docker swarm init
To join the worker node to the swarm I use the following command: docker swarm join --token XXX dockman.myhost.com:2376
But I get an error:
Error response from daemon: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"
I thought I could test it further by trying to connect to the docker API on the manager node from the worker node:
sudo docker --tlsverify --tlscacert=/var/docker/ca.pem --tlscert=./server-cert.pem --tlskey=./server-key.pem -H=127.0.0.1:2376 version
The result is:
Client: Docker Engine - Community
Version: 19.03.5
API version: 1.40
Go version: go1.12.12
Git commit: 633a0ea838
Built: Wed Nov 13 07:29:52 2019
OS/Arch: linux/amd64
Experimental: false
The server probably has client authentication (--tlsverify) enabled. Please check your TLS client certification settings: Get https://127.0.0.1:2376/v1.40/version: remote error: tls: bad certificate
This second test has given me lots more to think about. Of course it will fail because I am trying to connect with a server certificate and not a client certificate, but isn't that exactly what the docker swarm join is trying to do? It doesn't make sense to me to put the client certificate into daemon.json. I googled making a single certificate both server and client and it is possible but seems to be bad practice. I would have thought it would have been covered in the tutorial if it was required.
I have been stuck at this point. I can't work out what certificate setup is required.
I have been following https://github.com/docker/docker.github.io/blob/master/swarm/configure-tls.md This describes the creation of certificates but doesn't mention client or server auth at all.
Update 1
I found a document that said certs need to be client and server
https://hub.docker.com/_/swarm/
So I remade the node certificate to be both client and server. Now the docker version command works when run from the node but not the swarm join.