0

I am trying to create a cluster adding 2 custom VMs.

I create the cluster by setting the name and defining the roles for each of the nodes (etcd, controlpane and worker), and afterwards execute the command in each of the nodes.

After several minutes waiting, I see the following error:

[[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]

These IP addresses are the IP addresses of the nodes being added to the cluster. Server IP is X.Y.Z.9 and has none of these roles.

All 3 VMs (server and work nodes) are freshly installed CentOS 7. I have done this setup with SELINUX enabled, but I have also tried disabling it for testing purposes on all 3 VMs, just to check if this was a problem with SELINUX and Rancher.

Am I missing a step? Where should I be looking into? I have checked the logs of the rancher server container, here is part of the log:

2019/12/02 12:10:26 [INFO] kontainerdriver rancherkubernetesengine stopped
2019/12/02 12:10:26 [ERROR] ClusterController c-mb7xc [cluster-provisioner-controller] failed with : [[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]
2019-12-02 12:13:26.885195 I | mvcc: store.index: compact 115706
2019-12-02 12:13:26.886955 I | mvcc: finished scheduled compaction at 115706 (took 1.379118ms)
2019/12/02 12:14:26 [INFO] Provisioning cluster [c-mb7xc]
2019/12/02 12:14:26 [INFO] Creating cluster [c-mb7xc]
2019/12/02 12:14:31 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:42728
2019/12/02 12:14:31 [ERROR] Cluster c-mb7xc previously failed to create
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: Initiating Kubernetes cluster
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: [certificates] Generating admin certificates and kubeconfig
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: Successfully Deployed state file at [management-state/rke/rke-770316984/cluster.rkestate]
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: Building Kubernetes cluster
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: [dialer] Setup tunnel for host [X.Y.Z.14]
2019/12/02 12:14:31 [INFO] [network] Starting stopped container [rke-etcd-port-listener] on host [X.Y.Z.10]
2019/12/02 12:14:31 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #1
2019/12/02 12:14:31 [INFO] [network] Starting stopped container [rke-etcd-port-listener] on host [X.Y.Z.14]
2019/12/02 12:14:31 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #1
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: [dialer] Setup tunnel for host [X.Y.Z.10]
2019/12/02 12:14:31 [INFO] cluster [c-mb7xc] provisioning: [network] Deploying port listener containers
2019/12/02 12:14:31 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (5a38613b1495ef436cd7842ade853e6f2a11948f5f00f0d2a0ff0d57e83aa115): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:31 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #2
2019/12/02 12:14:31 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (445b5b6cbaf4a2078f15d44741b91245d4f63288bb1ad3894787f9060ada4e33): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:31 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #2
2019/12/02 12:14:32 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (d5a2cfc270aab68cee979b2fe1705a2ff574ba167f0ad011d2626e4edc94ac01): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:32 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #3
2019/12/02 12:14:32 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (1e7463f5c50b7d967824a380695cbf6f73e1c8f13368c6e12712330e64d6a358): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:32 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #3
2019/12/02 12:14:32 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (81941519760f80c47c05f5a44c8076adfb796a6675201614931b51bbb7b63714): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:32 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (9fff51bada3afffc9c14a7c5ddf5f25e889b71a2d128dca8d7cda8c56fa7fed4): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:14:32 [INFO] cluster [c-mb7xc] provisioning: [network] Port listener containers deployed successfully
2019/12/02 12:14:32 [INFO] Image [rancher/rke-tools:v0.1.51] exists on host [X.Y.Z.14]
2019/12/02 12:14:32 [INFO] Image [rancher/rke-tools:v0.1.51] exists on host [X.Y.Z.10]
2019/12/02 12:14:32 [INFO] cluster [c-mb7xc] provisioning: [network] Running etcd <-> etcd port checks
2019/12/02 12:14:32 [INFO] Starting container [rke-port-checker] on host [X.Y.Z.14], try #1
2019/12/02 12:14:32 [INFO] cluster [c-mb7xc] provisioning: [network] Successfully started [rke-port-checker] container on host [X.Y.Z.14]
2019/12/02 12:14:32 [INFO] Removing container [rke-port-checker] on host [X.Y.Z.14], try #1
2019/12/02 12:14:32 [INFO] Starting container [rke-port-checker] on host [X.Y.Z.10], try #1
2019/12/02 12:14:32 [INFO] cluster [c-mb7xc] provisioning: [network] Successfully started [rke-port-checker] container on host [X.Y.Z.10]
2019/12/02 12:14:38 [INFO] Removing container [rke-port-checker] on host [X.Y.Z.10], try #1
2019/12/02 12:14:38 [ERROR] cluster [c-mb7xc] provisioning: [[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]
2019/12/02 12:14:38 [INFO] kontainerdriver rancherkubernetesengine stopped
2019/12/02 12:14:38 [ERROR] ClusterController c-mb7xc [cluster-provisioner-controller] failed with : [[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]
2019-12-02 12:18:26.889193 I | mvcc: store.index: compact 116351
2019-12-02 12:18:26.890642 I | mvcc: finished scheduled compaction at 116351 (took 1.10593ms)
2019/12/02 12:22:38 [INFO] Provisioning cluster [c-mb7xc]
2019/12/02 12:22:38 [INFO] Creating cluster [c-mb7xc]
2019/12/02 12:22:43 [INFO] kontainerdriver rancherkubernetesengine listening on address 127.0.0.1:33176
2019/12/02 12:22:43 [ERROR] Cluster c-mb7xc previously failed to create
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: Initiating Kubernetes cluster
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: [certificates] Generating admin certificates and kubeconfig
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: Successfully Deployed state file at [management-state/rke/rke-153618103/cluster.rkestate]
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: Building Kubernetes cluster
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: [dialer] Setup tunnel for host [X.Y.Z.10]
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: [dialer] Setup tunnel for host [X.Y.Z.14]
2019/12/02 12:22:43 [INFO] [network] Starting stopped container [rke-etcd-port-listener] on host [X.Y.Z.14]
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #1
2019/12/02 12:22:43 [INFO] [network] Starting stopped container [rke-etcd-port-listener] on host [X.Y.Z.10]
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #1
2019/12/02 12:22:43 [INFO] cluster [c-mb7xc] provisioning: [network] Deploying port listener containers
2019/12/02 12:22:43 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (7ef490bf1c3963f131972836836d7f01acf0a7f9f808eede2cf19e57e4b3c62c): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #2
2019/12/02 12:22:43 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (f42e672d345f9468871fcc130c432885dde17b70bda4f2dc23d1f7f443ecac6e): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #2
2019/12/02 12:22:43 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (11c409cf4e33232e2c5d39ae60981620793ca531482a8f09677e7c3e47750df6): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.10], try #3
2019/12/02 12:22:43 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (26df18e96a9a328a390a2d3a832cf665c7ef455e46058b0748f0e63e6c356612): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:43 [INFO] Starting container [rke-etcd-port-listener] on host [X.Y.Z.14], try #3
2019/12/02 12:22:44 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.10]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (142ef4546b5b9afb113ef7282970e84dce1131dce21e32caafde54d870838792): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:44 [WARNING] Can't start Docker container [rke-etcd-port-listener] on host [X.Y.Z.14]: Error response from daemon: driver failed programming external connectivity on endpoint rke-etcd-port-listener (8e0eda16eb1e99088e4bd2dd3f5134bf6230fdc03dd10aac24c76e6d71826ac3): Error starting userland proxy: listen tcp 0.0.0.0:2380: bind: address already in use
2019/12/02 12:22:44 [INFO] cluster [c-mb7xc] provisioning: [network] Port listener containers deployed successfully
2019/12/02 12:22:44 [INFO] Image [rancher/rke-tools:v0.1.51] exists on host [X.Y.Z.14]
2019/12/02 12:22:44 [INFO] Image [rancher/rke-tools:v0.1.51] exists on host [X.Y.Z.10]
2019/12/02 12:22:44 [INFO] cluster [c-mb7xc] provisioning: [network] Running etcd <-> etcd port checks
2019/12/02 12:22:44 [INFO] Starting container [rke-port-checker] on host [X.Y.Z.14], try #1
2019/12/02 12:22:44 [INFO] cluster [c-mb7xc] provisioning: [network] Successfully started [rke-port-checker] container on host [X.Y.Z.14]
2019/12/02 12:22:44 [INFO] Starting container [rke-port-checker] on host [X.Y.Z.10], try #1
2019/12/02 12:22:44 [INFO] Removing container [rke-port-checker] on host [X.Y.Z.14], try #1
2019/12/02 12:22:44 [INFO] cluster [c-mb7xc] provisioning: [network] Successfully started [rke-port-checker] container on host [X.Y.Z.10]
2019/12/02 12:22:49 [INFO] Removing container [rke-port-checker] on host [X.Y.Z.10], try #1
2019/12/02 12:22:49 [ERROR] cluster [c-mb7xc] provisioning: [[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]
2019/12/02 12:22:49 [INFO] kontainerdriver rancherkubernetesengine stopped
2019/12/02 12:22:49 [ERROR] ClusterController c-mb7xc [cluster-provisioner-controller] failed with : [[network] Host [X.Y.Z.14] is not able to connect to the following ports: [X.Y.Z.10:2379, X.Y.Z.10:2380]. Please check network policies and firewall rules]
Miguel Mesquita Alfaiate
  • 2,851
  • 5
  • 30
  • 56

2 Answers2

0

All the warnings say listen tcp 0.0.0.0:2380: bind: address already in use. I'd suggest you to look if there's a service running on that port already. If not, see if there's a container (maybe stopped at this time) that has this port bound to itself.

Use docker container ls -a to list all the containers including the ones that are not running. If you're using Linux, use netstat -tulpen | grep 2380 to list the services running on port 2380. For Windows, use netstat -an | findstr 2380 in command prompt to do the same.

7_R3X
  • 3,904
  • 4
  • 25
  • 43
  • I don't quite understand what might be wrong here. There is nothing running in the server except for the rancher containers. I am trying to get the 2 nodes to run etc, controlpane and worker roles. The port indeed seems to be already in use, by a rancher container? https://snipboard.io/dZL0l5.jpg – Miguel Mesquita Alfaiate Dec 02 '19 at 18:41
  • @BlunT : I see that the port is acquired by a process named "etcd". It's not the container. Containers usually show up with a name of "docker-proxy" or "dockerd". You check and see if you can stop that process. Once you have, restart the contianer and see if it works. Let me know what happens. – 7_R3X Dec 03 '19 at 11:45
  • isn't the etcd one of the roles of rancher nodes? I am adding hosts with etcd, controlplane and worker roles. – Miguel Mesquita Alfaiate Dec 04 '19 at 10:13
  • @BlunT: Can you share the command whose output you have posted? Share the docker-compose.yaml as well, if you're using any. – 7_R3X Dec 04 '19 at 11:32
  • @BlunT: Did you fire the `netstat` command from inside the container or on the machine that's running the container? – 7_R3X Dec 04 '19 at 11:35
  • yes I did, the screenshot is in my first comment to your reply. – Miguel Mesquita Alfaiate Dec 04 '19 at 11:40
  • @BlunT: Did you fire it inside the container or on the machine? – 7_R3X Dec 04 '19 at 11:47
  • on the machine, not the container. the port is bound in the machine. – Miguel Mesquita Alfaiate Dec 04 '19 at 14:20
  • @BlunT: In that case, I believe there's an instance of `etcd` running on your machine, apart from the docker container. – 7_R3X Dec 04 '19 at 14:28
  • I am not sure but I believe the issue I have found is related to a bug already open in Rancher: https://github.com/rancher/rke/issues/1760 apparently there is a bug in the cluster creation validations. I am still waiting for a reply to my message. – Miguel Mesquita Alfaiate Dec 04 '19 at 19:09
0

I was able to get past this issue after starting from scratch with Ubuntu and newer versions of Rancher.

I don't believe operating system to be the issue here, but there was a known problem in that rancher version.

Miguel Mesquita Alfaiate
  • 2,851
  • 5
  • 30
  • 56