21

We are trying to monitor K8S with Grafana and Prometheus Operator. Most of the metrics are working as expected and I was able to see the dashboard with the right value, our system contain 10 nodes with overall 500 pods. Now when I restarted Prometheus all the data was deleted. I want it to be stored for two week.

My question is, How can I define to Prometheus volume to keep the data for two weeks or 100GB DB.

I found the following (we use Prometheus operator):

https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/storage.md

This is the config of the Prometheus Operator

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  labels:
    k8s-app: prometheus-operator
  name: prometheus-operator
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: prometheus-operator
  template:
    metadata:
      labels:
        k8s-app: prometheus-operator
    spec:
      containers:
      - args:
        - --kubelet-service=kube-system/kubelet
        - --logtostderr=true
        - --config-reloader-image=quay.io/coreos/configmap-reload:v0.0.1
        - --prometheus-config-reloader=quay.io/coreos/prometheus-config-reloader:v0.29.0
        image: quay.io/coreos/prometheus-operator:v0.29.0
        name: prometheus-operator
        ports:
        - containerPort: 8080
          name: http

This is the config of the Prometheus

    apiVersion: monitoring.coreos.com/v1
    kind: Prometheus
    metadata:
      name: prometheus
      namespace: monitoring
      labels: 
        prometheus: prometheus
    spec:
      replica: 2
      serviceAccountName: prometheus
      serviceMonitorNamespaceSelector: {}
      serviceMonitorSelector:
        matchLabels:
          role: observeable
      tolerations:
      - key: "WorkGroup"
        operator: "Equal"
        value: "operator"
        effect: "NoSchedule"
      - key: "WorkGroup"
        operator: "Equal"
        value: "operator"
        effect: "NoExecute"
      resources:
        limits:
          cpu: 8000m
          memory: 24000Mi
        requests:
          cpu: 6000m
          memory: 6000Mi
     storage:
       volumeClaimTemplate:
         spec:
        selector:
          matchLabels:
            app: prometheus
        resources:
          requests:
            storage: 100Gi

https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/storage.md

We have file system (nfs), and the above storage config doesn't works, my questions are:

  1. What I miss here is how to config the volume, server , path in the following its under the nfs section? Where should I find this /path/to/prom/db? How can I refer to it? Should I create it somehow, or just provide the path?

We have NFS configured in our system.

  1. How to combine it to Prometheus?

As I don't have deep knowledge in pvc and pv, I've created the following (not sure regard those values, what is my server and what path should I provide)...

server: myServer
path: "/path/to/prom/db"

What should I put there and how I make my Prometheus (i.e. the config I have provided in the question) to use it?

apiVersion: v1
kind: PersistentVolume
metadata:
  name: prometheus
  namespace: monitoring
  labels:
    app: prometheus
    prometheus: prometheus
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce # required
  nfs:
    server: myServer
    path: "/path/to/prom/db"

If there any other persistence volume other than nfs which I can use for my use-case? Please advice how.

Alexey Vazhnov
  • 1,291
  • 17
  • 20
JME
  • 881
  • 2
  • 11
  • 23
  • Does the query work directly from Prometheus ? I mean when you query directly from the Prometheus UI. –  Mar 11 '19 at 19:46
  • Also do you have Audit logging enabled ? If yes, can you seen if API requests are going from prometheus serviceaccount/user towards the API server ? –  Mar 11 '19 at 19:47
  • @JasonStanley - thanks for the suggestion , how should I use this in Prometheus `{pod=~"^$Pod$"})` query UI ? I want to run a query to get the data for `all pods in the cluster` ... (all nodes pods) – JME Mar 11 '19 at 19:56
  • In the prometheus UI, just run the query `kube_pod_container_resource_limits_cpu_cores` This should return a long list of metrics of ALL your pods. If this result returns that, then it means prometheus config is OK and something needs to be tuned on Grafana. BUT if you're not getting a response to the query, then the problem lies with your Prometheus config. –  Mar 11 '19 at 20:03
  • yes your query should ONLY be `kube_pod_container_resource_limits_cpu_cores` –  Mar 11 '19 at 20:06
  • OK so now we are sure that PROM is OK, lets focus on Grafana. Create a new dasboard. Put in the following seperate queries `kube_pod_container_resource_limits_cpu_cores` and `sum(kube_pod_container_resource_limits_cpu_cores)`. You can change the time interval to show last 1 minute for testing. What do you get ? –  Mar 11 '19 at 20:16

5 Answers5

4

I started working with the operator chart recently ,

And managed to add persistency without defining pv and pvc.

On the new chart configuration adding persistency is much easier than you describe just edit the file /helm/vector-chart/prometheus-operator-chart/values.yaml under prometheus.prometheusSpec:

storageSpec:
  volumeClaimTemplate:
    spec:
      storageClassName: prometheus
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 10Gi
    selector: {}

And add this /helm/vector-chart/prometheus-operator-chart/templates/prometheus/storageClass.yaml:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: prometheus
provisioner: kubernetes.io/aws-ebs
reclaimPolicy: Retain
parameters:
  type: gp2
  zones: "ap-southeast-2a, ap-southeast-2b, ap-southeast-2c"
  encrypted: "true"

This will automatically create you both pv and a pvc which will create an ebs in aws which will store all your data inside.

Shahar Hamuzim Rajuan
  • 5,610
  • 9
  • 53
  • 91
  • This is the answer i was looking for thanks. Although I didnt need to create a storage class. I'm using AKS which has 2 by default. `default|managed-premium` You can view them using the following command `kubectl get storageclass`. – Christo Mar 27 '20 at 12:44
1

you must have to use persistent volume and volume claim (PV & PVC) for persist data. You can refer "https://kubernetes.io/docs/concepts/storage/persistent-volumes/" must see carefully provisioning, reclaim policy, access mode, storage type in above url.

k''
  • 702
  • 1
  • 8
  • 19
  • well I know that :) , the problem is that I wasnt able to figure it out from Prometheus, it will be great if you can provide example for my context – JME Mar 23 '19 at 12:11
  • I usually install Prometheus, grafana using helm default repository using "helm install --name prometheus stable/prometheus" . Here option one is check whole helm chart or run above command and then describe all components of chart. You will definitely get it. – k'' Mar 23 '19 at 13:22
1

To determine when to remove old data, use this switch --storage.tsdb.retention

e.g. --storage.tsdb.retention='7d' (by default, Prometheus keeps data for 15 days).

To completely remove the data use this API call:

$ curl -X POST -g 'http://<your_host>:9090/api/v1/admin/tsdb/<your_index>'

EDIT

Kubernetes snippet sample

...
 spec:
      containers:
      - name: prometheus
        image: docker.io/prom/prometheus:v2.0.0
        args:
          - '--config.file=/etc/prometheus/prometheus.yml'
          - '--storage.tsdb.retention=7d'
        ports:
        - name: web
containerPort: 9090
...
mati kepa
  • 2,543
  • 19
  • 24
  • thanks, where should I put this parameter in the yaml ? can you provide examle ? and where does it keep it if I dont provide volume ? – JME Mar 26 '19 at 10:50
  • Thanks, please see my update. we are using Prometheus operator and I put the config file in the question (having two files 1 is operator 2 is the Prometheus crd) how should I update it then since the container is coming from the operator and not directly from the kind promtheus crd ....can you please update ? – JME Mar 26 '19 at 14:20
  • Inside your operator config under `spec.template.spec.containers.args` --------------------- Please read about persistent volume concept in docker. By default the data will be stored inside container until restart (so it can be 5 minutes or several weeks). Important is that containers meant to be ephemeral (short live time). – mati kepa Mar 26 '19 at 14:36
  • ok, not sure regard the args, can you update in your answer to make it clearon which file should I pass them since currently I use args in the second file (with the operator) and not in the crd, – JME Mar 26 '19 at 14:41
  • This is a bit confusing since im not using Prometheus docker config. we are using the operator config, it will be great if you can update the second file in the question inside your answer – JME Mar 26 '19 at 14:50
  • so I need to add it explicitly to the `first file` ? – JME Mar 26 '19 at 15:45
  • when I try to add your suggestion to the first file I got error: `spec.containers.ports.containerPort in body is required` I've added it as-is to the first file, it will be great if you provide the full answer with my context . ie my file – JME Mar 26 '19 at 15:56
  • when I remove the port is working but when I change for example the version of preometheus I see it just in the operator(second file config describe) and not in the replica ... – JME Mar 26 '19 at 16:05
  • I've put a bounty to the question, it will be great if you provide a complite example which I can use with prometeus operator ( my config file that in the question...) thanks – JME Mar 26 '19 at 16:15
1

refer the below code. define storage-retention to 7d or the required retention days in a configmap and load it as env variable in the container as shown below

      containers:
      - name: prometheus
        image: image: prom/prometheus:latest
        args:
          - '--storage.tsdb.path=/prometheus'
          - '--storage.tsdb.retention=$(STORAGE_RETENTION)'
          - '--web.enable-lifecycle'
          - '--storage.tsdb.no-lockfile'
          - '--config.file=/etc/prometheus/prometheus.yml'
        ports:
        - name: web
          containerPort: 9090
        env:
        - name: STORAGE_RETENTION
          valueFrom:
            configMapKeyRef:
              name: prometheus.cfg
              key: storage-retention

you might need to adjust these settings in the prometheus operator files

P Ekambaram
  • 15,499
  • 7
  • 34
  • 59
  • Thanks, Im using Prometheus operator, please see my files in the question and provide this example with this context since there is some diffrenace between the operator and prometheus alone..., 2. I need `volume` since if the pod is killed the retention period will not help.... – JME Mar 28 '19 at 09:09
  • I need to define volume and in the question I've the config of Prometheus and the operator , what I miss is the `nfs` config `server` and `path` ...how can I add them / config etc, that's it... – JME Mar 28 '19 at 09:16
  • create nfs pv and then bound it with a pvc. map the pvc to prometheus data in the deployment yaml – P Ekambaram Mar 28 '19 at 09:58
-1

Providing insight about what I gathered since we just started setting up kube-prometheus operator and ran into storage issues with default settings.

Create a custom values.yaml with helm show values command as below with default values.

helm show values prometheus-com/kube-prometheus-stack -n monitoring > custom-values.yaml

Then start updating prometheus, alertmanager, and grafana sections to either override default settings or add custom names, etc...

Coming to the storage options, I see following in the documentation to define custom storageclass or PV/PVC(if there is no default SC or other reasons).

Also here is a good example for using storageclass for all 3 pods.

cnu
  • 461
  • 8
  • 19