1

Kubernetes Version: 1.10

We are running Prometheus on a Kubernetes cluster(running on Bare metal). Kubernetes is running with only one POD.

ISSUE - Prometheus metrics are not persisted if it's POD restarts. We also tried to configure persistence volume as local. Now if that pod is rescheduled to any other Node of the cluster then It loses all previous data which was persisted on the previous node. We also tried to configure Prometheus remote storage to read and write but It didn't work because it was quite slow. Is there any other option to persist data on Kubernetes on Bare metal?

  • Did you specify a PersistenceVolumeClaim for prometheus? Maybe you can share your PersistenceVolume and PersistenceVolumeClaim yamls? – Blokje5 Dec 04 '18 at 09:04

1 Answers1

0

I had same issue while configuring prometheus on baremetal and this is how I resolved it.

You can use local-storage storage class for PV and PVC which bind your PVC to your node. So whenever the node restart pod will be scheduled on the same node where PVC is there. I am sharing my json files:

Prometheus-pv.json

{
  "kind": "PersistentVolume",
  "apiVersion": "v1",
  "metadata": {
    "name": "prometheus-vol",
    "namespace": "monitoring"
    "labels": {
      "type": "local",
      "app": "harmony-vol"
    }
  },
  "spec": {
    "capacity": {
      "storage": "10Gi"
    },
    "accessModes": [
      "ReadWriteOnce"
    ],
    "storageClassName": "local-storage",
    "local": {
      "path": "/data"
    },
    "claimRef": {
      "namespace": "default",
      "name": "data-prafull-0"
    },
    "nodeAffinity": {
      "required": {
        "nodeSelectorTerms": [
          {
            "matchExpressions": [
              {
                "key": "kubernetes.io/hostname",
                "operator": "In",
                "values": [
                  "<node_name>"
                ]
              }
            ]
          }
        ]
      }
    }
  }
}

Prometheus.json

{
    "apiVersion": "monitoring.coreos.com/v1",
    "kind": "Prometheus",
    "metadata": {
        "labels": {
            "prometheus": "prafull"
        },
        "name": "prafull",
        "namespace": "monitoring"
    },
    "spec": {
        "alerting": {
            "alertmanagers": [
                {
                    "name": "alertmanager-main",
                    "namespace": "monitoring",
                    "port": "web"
                }
            ]
        },
        "baseImage": "quay.io/prometheus/prometheus",
        "replicas": 2,
        "resources": {
            "requests": {
                "memory": "400Mi"
            }
        },
        "ruleSelector": {
            "matchLabels": {
                "prometheus": "prafull",
                "role": "alert-rules"
            }
        },
        "securityContext": {
            "fsGroup": 0,
            "runAsNonRoot": false,
            "runAsUser": 0
        },
        "serviceAccountName": "prometheus",
        "serviceMonitorSelector": {
            "matchExpressions": [
                {
                    "key": "k8s-app",
                    "operator": "Exists"
                }
            ]
        },
        "storage": {
            "class": "",
            "resources": {},
            "selector": {},
            "volumeClaimTemplate": {
                "metadata": {
                     "name": "data"
                },
                "spec": {
                    "accessModes": [
                         "ReadWriteOnce"
                    ],
                    "storageClassName": "local-storage",
                    "resources": {
                        "requests": {
                            "storage": "10Gi"
                        }
                    }
                }
            }
        },
        "version": "v2.2.1"
    }
}

After applying this your pod will not be reschedule to another node because PV, PVC are bound to that node

Prafull Ladha
  • 12,341
  • 2
  • 37
  • 58
  • But that means your prom is unavailable at the time node is restarting thus loosing the metrics – piyushGoyal Dec 04 '18 at 10:00
  • Yes, you are right. For prod env you can use prometheus high availability https://coreos.com/operators/prometheus/docs/latest/high-availability.html This HA brings additional complexity as well In case of baremetal, you're storing your data on node storage only and in that case restart leads to loss of metrics. That is the tough call you have to make to go for prom HA. – Prafull Ladha Dec 04 '18 at 15:07