3

I am new to Rancher and containers in general. While setting up Kubernetes cluster using Rancher, i’m facing problem while accessing Kubernetes dashboard.

rancher/server: 1.6.6

Single node Rancher server + External MySQL + 3 agent nodes

Infrastructure Stack versions:
healthcheck: v0.3.1
ipsec: net:v0.11.5
network-services: metadata:v0.9.2 / network-manager:v0.7.7
scheduler: k8s:v1.7.2-rancher5
kubernetes (if applicable): kubernetes-agent:v0.6.3


# docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 1
Server Version: 17.03.1-ce
Storage Driver: overlay
Backing Filesystem: extfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host macvlan null overlay
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 4ab9917febca54791c5f071a9d1f404867857fcc
runc version: 54296cf40ad8143b62dbcaa1d90e520a2136ddfe
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.34-rancher
Operating System: RancherOS v1.0.3
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.798 GiB
Name: ch7radod1
ID: IUNS:4WT2:Y3TV:2RI4:FZQO:4HYD:YSNN:6DPT:HMQ6:S2SI:OPGH:TX4Y
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Http Proxy: http://proxy.ch.abc.net:8080
Https Proxy: http://proxy.ch.abc.net:8080
No Proxy: localhost,.xyz.net,abc.net
Registry: https://index.docker.io/v1/
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false

Accessing UI URL http://10.216.30.10/r/projects/1a6633/kubernetes-dashboard:9090/# shows “Service unavailable”

If i use the CLI section from the UI, i get the following:

> kubectl get nodes
NAME              STATUS    AGE       VERSION
ch7radod3       Ready     1d        v1.7.2
ch7radod4       Ready     5d        v1.7.2
ch7radod1       Ready     1d        v1.7.2

> kubectl get pods --all-namespaces
NAMESPACE     NAME                                   READY     STATUS              RESTARTS   AGE
kube-system   heapster-4285517626-4njc2              0/1       ContainerCreating   0          5d
kube-system   kube-dns-3942128195-ft56n              0/3       ContainerCreating   0          19d
kube-system   kube-dns-646531078-z5lzs               0/3       ContainerCreating   0          5d
kube-system   kubernetes-dashboard-716739405-lpj38   0/1       ContainerCreating   0          5d
kube-system   monitoring-grafana-3552275057-qn0zf    0/1       ContainerCreating   0          5d
kube-system   monitoring-influxdb-4110454889-79pvk   0/1       ContainerCreating   0          5d
kube-system   tiller-deploy-737598192-f9gcl          0/1       ContainerCreating   0          5d

The setup uses private registry (Artifactory). I checked Artifactory and i could see several images present related to Docker. I was going through private registry section and i also saw this file. In case this file is required, where exactly do i keep it so that Rancher can fetch it and configure the Kubernetes dashboard?

UPDATE:

$ sudo ros engine switch docker-1.12.6
> ERRO[0031] Failed to load https://raw.githubusercontent.com/rancher/os-services/v1.0.3/index.yml: Get https://raw.githubusercontent.com/rancher/os-services/v1.0.3/index.yml: Proxy Authentication Required
> FATA[0031] docker-1.12.6 is not a valid engine

I thought may be it’s due to NGINX so i stopped the NGINX container but i am still getting the above error. Earlier i have tried the same command on this Rancher server and it used to work fine. It’s working fine on agent nodes although they are already having 1.12.6 configured.

UPDATE 2:

> kubectl -n kube-system get po
NAME                                 READY STATUS            RESTARTS AGE
heapster-4285517626-4njc2            1/1   Running           0        12d
kube-dns-2588877561-26993            0/3   ImagePullBackOff  0        5h
kube-dns-646531078-z5lzs             0/3   ContainerCreating 0        12d
kubernetes-dashboard-716739405-zq3s9 0/1   CrashLoopBackOff  67       5h
monitoring-grafana-3552275057-qn0zf  1/1   Running           0        12d
monitoring-influxdb-4110454889-79pvk 1/1   Running           0        12d
tiller-deploy-737598192-f9gcl        0/1   CrashLoopBackOff  72       12d
Technext
  • 7,887
  • 9
  • 48
  • 76
  • What version of rancherOS? – ivan.sim Aug 14 '17 at 04:04
  • RancherOS v1.0.3 – Technext Aug 14 '17 at 04:08
  • Are you behind a proxy? Can you reach `https://raw.githubusercontent.com/rancher/os-services/v1.0.3/index.yml` by using `curl` or something? – ivan.sim Aug 14 '17 at 04:11
  • `curl` is not available on Rancher. All agent and Rancher server fetch images via Artifactory. I remember very well that this command (`ros engine switch`) used to work earlier on this machine. After NGINX setup, this might have stopped working. I stopped the NGINX container but still it's not working. `sudo ros engine list` is working fine on agent nodes. – Technext Aug 14 '17 at 04:20
  • I tried `wget` but it's not working. Just keeps waiting because it's not connected to Internet. – Technext Aug 14 '17 at 04:23
  • What about other nodes in your cluster? Can they reach the external URL? – ivan.sim Aug 14 '17 at 05:21
  • Yes, on other nodes, `ros engine switch` works fine. I am confused how come they are able to access but this instance can't. I compared env variables of server and other nodes but they all look similar. Also, there is nothing in any initialization file (`.bashrc,.bash_profile,/etc/profile*`) on any instance. – Technext Aug 14 '17 at 06:09
  • The person who did the setup earlier had used his own credential for proxy and once it started working, he removed the proxy settings from the config of all the instances. Since the agent machines were not restarted (but server instance was) after that, things were working fine there but not on server instance. I am again back to my original issue because even after changing the Docker version to 1.12.6 on server (agent nodes are already on 1.12.6), Kubernetes dashboard shows Service Unavailable. – Technext Aug 16 '17 at 11:57
  • In this [link](http://rancher.com/docs/rancher/v1.6/en/kubernetes/private-registry/), for Helm, Dashboard etc, it mentions copying the exact version of images. I have not performed any step mentioned here. Is that the reason for dashboard not working? What exactly needs to be copied and where? By the way, I am using private registry (Artifactory). – Technext Aug 16 '17 at 11:57
  • The dashboard, tiller, monitoring and kubedns are all part of the kubernetes [addons](http://rancher.com/docs/rancher/v1.6/en/kubernetes/addons/). Check to see if that's what your kubernetes stack is [configured](http://rancher.com/docs/rancher/v1.6/en/kubernetes/#configuring-kubernetes) to use. – ivan.sim Aug 16 '17 at 15:52
  • 1
    Try run `kubectl -n kube-system get po` to see if the pods are installed are at. – ivan.sim Aug 16 '17 at 15:53
  • I could not find anything relevant in the template except for the section where we have to specify the private registry name (`docker.artifactory.abc.net`) for Add-ons and Pod Infra Container Image. That's already configured. For kubectl command output, please check UPDATE2. – Technext Aug 16 '17 at 16:07
  • 1
    Check for error logs and events with `kubectl -n kube-system log kubernetes-dashboard-716739405-zq3s9` and `kubectl -n kube-system describe kubernetes-dashboard-716739405-zq3s9`. – ivan.sim Aug 16 '17 at 17:02
  • Thanks @ivan.sim for all your inputs. I really appreciate it. :) – Technext Aug 17 '17 at 06:40

2 Answers2

0

None of your pods running, you need to resolve that issue first. try to restart the whole cluster and see all above pods in running status.

sfgroups
  • 18,151
  • 28
  • 132
  • 204
  • Issue seems to be with the Docker version's (17.03.1-ce) compatibility with Kubernetes. I was aware of this but i forgot while fixing other stuff. While i was doing all this, i recently put NGINX for HTTPS. Now if i am trying to change the docker version to 1.12.6, i am getting error message. Please see UPDATE in my post. – Technext Aug 14 '17 at 03:51
0

Based on @ivan.sim's suggestion, i posted 'UPDATE 2'. This started me finally to look in the right direction. I then started looking for CrashLoopBackOff error online and came across this link and tried the following command (using CLI option from Rancher console), which was actually quite similar to what @ivan.sim suggested above but this helped me with the node where the dashboard process was running:

> kubectl get pods -a -o wide --all-namespaces
NAMESPACE     NAME                                   READY  STATUS              RESTARTS   AGE  IP                  NODE
kube-system   heapster-4285517626-4njc2              1/1    Running             0          12d  10.42.224.157       radod4
kube-system   kube-dns-2588877561-26993              0/3    ImagePullBackOff    0          5h   <none>              radod1
kube-system   kube-dns-646531078-z5lzs               0/3    ContainerCreating   0          12d  <none>              radod4
kube-system   kubernetes-dashboard-716739405-zq3s9   0/1    Error               70         5h   10.42.218.11        radod1
kube-system   monitoring-grafana-3552275057-qn0zf    1/1    Running             0          12d  10.42.202.44        radod4
kube-system   monitoring-influxdb-4110454889-79pvk   1/1    Running             0          12d  10.42.111.171       radod4
kube-system   tiller-deploy-737598192-f9gcl          0/1    CrashLoopBackOff    76         12d  10.42.213.24        radod4

Then i went to the host where the process was executing and tried the following command:

[rancher@radod1 ~]$
[rancher@radod1 ~]$ docker ps -a | grep dash
282334b0ed38  gcr.io/google_containers/kubernetes-dashboard-amd64@sha256:b537ce8988510607e95b8d40ac9824523b1f9029e6f9f90e9fccc663c355cf5d  "/dashboard --insecur"   About a minute ago   Exited (1) 55 seconds ago   k8s_kubernetes-dashboard_kubernetes-dashboard-716739405-zq3s9_kube-system_7b0afda7-8271-11e7-ae86-021bfe69c163_72
99836d7824fd  gcr.io/google_containers/pause-amd64:3.0                                                                                     "/pause"                 5 hours ago          Up 5 hours                  k8s_POD_kubernetes-dashboard-716739405-zq3s9_kube-system_7b0afda7-8271-11e7-ae86-021bfe69c163_1
[rancher@radod1 ~]$
[rancher@radod1 ~]$
[rancher@radod1 ~]$ docker logs 282334b0ed38
Using HTTP port: 8443
Creating API server client for https://10.43.0.1:443
Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: the server has asked for the client to provide credentials
Refer to the troubleshooting guide for more information: https://github.com/kubernetes/dashboard/blob/master/docs/user-guide/troubleshooting.md

After i got the above error, i again searched online and tried few things. Finally, this link helped. After i executed the following commands on all agent nodes, Kubernetes dashboard finally started working!

docker volume rm etcd
rm -rf /var/etcd/backups/*
Technext
  • 7,887
  • 9
  • 48
  • 76