Getting prometheus/grafana and k3s to work together

Question

T learn kubernetes I've built myself a bare metal cluster using 4 Raspberry PIs set it up using k3s:

# curl -sfL https://get.k3s.io | sh -

Added nodes etc., and everything comes up and I can see all the nodes and almost everything is working as expected.

I wanted to monitor the PIs so I installed the kube-prometheus-stack with helm:

$ kubectl create namespace monitoring
$ helm install prometheus --namespace monitoring prometheus-community/kube-prometheus-stack

And now everything looks fantastic:

$ kubectl get pods --all-namespaces 
NAMESPACE     NAME                                                     READY   STATUS      RESTARTS   AGE
kube-system   helm-install-traefik-crd-s8zw5                           0/1     Completed   0          5d21h
kube-system   helm-install-traefik-rc9f2                               0/1     Completed   1          5d21h
monitoring    prometheus-prometheus-node-exporter-j85rw                1/1     Running     10         28h
kube-system   metrics-server-86cbb8457f-mvbkl                          1/1     Running     12         5d21h
kube-system   coredns-7448499f4d-t7sp8                                 1/1     Running     13         5d21h
monitoring    prometheus-prometheus-node-exporter-mmh2q                1/1     Running     9          28h
monitoring    prometheus-prometheus-node-exporter-j4k4c                1/1     Running     10         28h
monitoring    alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running     10         28h
kube-system   svclb-traefik-zkqd6                                      2/2     Running     6          19h
monitoring    prometheus-prometheus-node-exporter-bft5t                1/1     Running     10         28h
kube-system   local-path-provisioner-5ff76fc89d-g8tm6                  1/1     Running     12         5d21h
kube-system   svclb-traefik-jcxd2                                      2/2     Running     28         5d21h
kube-system   svclb-traefik-mpbjm                                      2/2     Running     22         5d21h
kube-system   svclb-traefik-7kxtw                                      2/2     Running     20         5d21h
monitoring    prometheus-grafana-864598fd54-9548l                      2/2     Running     10         28h
kube-system   traefik-65969d48c7-9lh9m                                 1/1     Running     3          19h
monitoring    prometheus-prometheus-kube-prometheus-prometheus-0       2/2     Running     10         28h
monitoring    prometheus-kube-state-metrics-76f66976cb-m8k2h           1/1     Running     6          28h
monitoring    prometheus-kube-prometheus-operator-5c758db547-zsv4s     1/1     Running     6          28h

The services are all there:

$ kubectl get services --all-namespaces
NAMESPACE     NAME                                                 TYPE           CLUSTER-IP      EXTERNAL-IP                                                   PORT(S)                        AGE
default       kubernetes                                           ClusterIP      10.43.0.1       <none>                                                        443/TCP                        5d21h
kube-system   kube-dns                                             ClusterIP      10.43.0.10      <none>                                                        53/UDP,53/TCP,9153/TCP         5d21h
kube-system   metrics-server                                       ClusterIP      10.43.80.65     <none>                                                        443/TCP                        5d21h
kube-system   prometheus-kube-prometheus-kube-proxy                ClusterIP      None            <none>                                                        10249/TCP                      28h
kube-system   prometheus-kube-prometheus-kube-scheduler            ClusterIP      None            <none>                                                        10251/TCP                      28h
monitoring    prometheus-kube-prometheus-operator                  ClusterIP      10.43.180.73    <none>                                                        443/TCP                        28h
kube-system   prometheus-kube-prometheus-coredns                   ClusterIP      None            <none>                                                        9153/TCP                       28h
kube-system   prometheus-kube-prometheus-kube-etcd                 ClusterIP      None            <none>                                                        2379/TCP                       28h
kube-system   prometheus-kube-prometheus-kube-controller-manager   ClusterIP      None            <none>                                                        10252/TCP                      28h
monitoring    prometheus-kube-prometheus-alertmanager              ClusterIP      10.43.195.99    <none>                                                        9093/TCP                       28h
monitoring    prometheus-prometheus-node-exporter                  ClusterIP      10.43.171.218   <none>                                                        9100/TCP                       28h
monitoring    prometheus-grafana                                   ClusterIP      10.43.20.165    <none>                                                        80/TCP                         28h
monitoring    prometheus-kube-prometheus-prometheus                ClusterIP      10.43.207.29    <none>                                                        9090/TCP                       28h
monitoring    prometheus-kube-state-metrics                        ClusterIP      10.43.229.14    <none>                                                        8080/TCP                       28h
kube-system   prometheus-kube-prometheus-kubelet                   ClusterIP      None            <none>                                                        10250/TCP,10255/TCP,4194/TCP   28h
monitoring    alertmanager-operated                                ClusterIP      None            <none>                                                        9093/TCP,9094/TCP,9094/UDP     28h
monitoring    prometheus-operated                                  ClusterIP      None            <none>                                                        9090/TCP                       28h
kube-system   traefik                                              LoadBalancer   10.43.20.17     192.168.76.200,192.168.76.201,192.168.76.202,192.168.76.203   80:31131/TCP,443:31562/TCP     5d21h

Namespaces:

$ kubectl get namespaces 
NAME              STATUS   AGE
kube-system       Active   5d21h
default           Active   5d21h
kube-public       Active   5d21h
kube-node-lease   Active   5d21h
monitoring        Active   28h

But I couldn't reach the grafana service.

Fair enough I thought, let's define an Ingress but it didn't work:

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ingress
  namespace: monitoring
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
  - http:
      paths:
      - pathType: Prefix
        path: /
        backend:
          service:
            name: prometheus-grafana
            port:
              number: 80

I have no idea why it isn't getting to the service and I can't really see where the problem is, although I understand containers, etc. (I first had everything running on docker swarm), I don't really know where, if anywhere, it would be shown in the logs.

I've spent the past couple of days trying all sorts of things and I finally found a hint about name spaces and problems calling services and something called "type: ExternalName".

I checked with curl from a pod inside the cluster and it is delivering the data inside of the "monitoring" name space but traefik can't get there or maybe even see it?

Having looked at the Traefik documentation I found this regarding namespaces but I have no idea where I would start to find the mentioned:

providers:
  kubernetesCRD:
    namespaces:

I'm assuming that k3s has set this up correctly as an empty array because I can't find anything on their site that tells me what to do with their combination of "klipper-lb" and "traefik".

I finally tried to define another service with an external name:

---
apiVersion: v1
  kind: Service
  metadata:
    name: grafana-named
    namespace: kube-system
  spec:
    type: ExternalName
    externalName: prometheus-grafana.monitoring.svc.cluster.local
    ports:
    - name: service
      protocol: TCP
      port: 80
      targetPort: 80
---
apiVersion: networking.k8s.io/v1
kind: Ingress
  metadata:
    name: grafana-ingress
    namespace: kube-system
    annotations:
      kubernetes.io/ingress.class: traefik
  spec:
    rules:
    - http:
      paths:
      - pathType: Prefix
        path: /
        backend:
          service:
            name: grafana-named
            port:
              number: 80

After 2-3 days, I've tried everything I can think of, google everything under the sun and I can't get to grafana from outside of the internal cluster nodes.

I am at a loss as to how I can make anything work with k3s. I installed Lens on my main PC and can see almost everything there, but I think that the missing metrics information requires an Ingress or something like that too.

What do I have to do to get traefik to do what I think is basically it's job, route incoming requests to the backend services?

score 0 · Answer 1 · answered Sep 30 '21 at 17:22

I filed a bug report on github and one of the people there (thanks again brandond) pointed me in the right direction.

The network layer uses flannel to process the "in cluster" networking. The default implementation for that is something called "vxlan" and that is seemingly more complex with virtual ethernet adapters.

For my requirements (read: getting the cluster to even work), the solution was to change the implementation to "host-gw".

This is done by adding "--flannel-backend=host-gw" to the k3s.service option on the controller.

$ sudo systemctl edit k3s.service
### Editing /etc/systemd/system/k3s.service.d/override.conf
### Anything between here and the comment below will become the new contents of the file

[Service]
ExecStart=
ExecStart=/usr/local/bin/k3s \
    server \
      '--flannel-backend=host-gw'

### Lines below this comment will be discarded

The first "ExecStart=" clears the existing default start command to enable it to be replaced by the 2nd one.

Now everything is working as I expected, and I can finally move forward with learning K8s.

I'll probably reactivate "vxlan" at some point and figure that out too.

Looks like I'm struggling at the same point. But changing flannel-backend doesn't seem to work for me. Did you get other insights? — Dirk, Oct 19 '21 at 21:47

Getting prometheus/grafana and k3s to work together

1 Answers1