Prometheus doesn't discover podMonitor

Question

I'm trying to configure monitoring for strimzi Apache Kafka clusters. The Prometheus operator stack were already deployed using some helm chart (not using resources provided by strimzi). Kafka and zookeeper cluster are also deployed using strimzi. So in order to configure monitoring I did the following things:

Enabled metrics as in this example https://github.com/strimzi/strimzi-kafka-operator/blob/0.34.0/examples/metrics/kafka-metrics.yaml
Deployed prometheus rules provided by strimzi https://github.com/strimzi/strimzi-kafka-operator/blob/0.34.0/examples/metrics/prometheus-install/prometheus-rules.yaml
Deployed pod monitor using https://github.com/strimzi/strimzi-kafka-operator/blob/0.34.0/examples/metrics/prometheus-install/strimzi-pod-monitor.yaml (replaced namespaceSelector.matchNames with the namespace name were my kafka resources are deployed)
Edited prometheus resource (k edit prometheus <name>) to add to podMonitorSelector.matchLabels labels of my podMonitor.

But podMonitor is not displayed in service discovery in Prometheus. Did I missed something? Do you have any idea why it is not showing up?

My prometheus resources are located in namespace monitoring, here its (prometheus resource) yaml manifest:

apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
  kind: Prometheus
  metadata:
    annotations:
      meta.helm.sh/release-name: prometheus
      meta.helm.sh/release-namespace: monitoring
    creationTimestamp: "2021-06-28T05:38:09Z"
    generation: 45
    labels:
      app: strimzi
      app.kubernetes.io/instance: prometheus
      app.kubernetes.io/managed-by: Helm
      app.kubernetes.io/part-of: kube-prometheus-stack
      app.kubernetes.io/version: 20.1.0
      chart: kube-prometheus-stack-20.1.0
      heritage: Helm
      release: prometheus
    name: prometheus-kube-prometheus-prometheus
    namespace: monitoring
    resourceVersion: "464711341"
    uid: 67fbbadc-0ecc-4f8a-9f7c-ba51efecb38e
  spec:
    additionalScrapeConfigs:
      key: additional-scrape-configs.yaml
      name: prometheus-kube-prometheus-prometheus-scrape-confg
    alerting:
      alertmanagers:
      - apiVersion: v2
        name: prometheus-kube-prometheus-alertmanager
        namespace: monitoring
        pathPrefix: /
        port: web
    enableAdminAPI: false
    evaluationInterval: 30s
    externalUrl: 
    image: quay.io/prometheus/prometheus:v2.31.1
    listenLocal: false
    logFormat: logfmt
    logLevel: info
    paused: false
    podMonitorNamespaceSelector: {}
    podMonitorSelector:
      matchLabels:
        app: strimzi
        release: prometheus
    portName: web
    probeNamespaceSelector: {}
    probeSelector:
      matchLabels:
        release: prometheus
    remoteWrite:
    - name: bdsm
      url: http://promscale-connector.bdsm:9201/write
      writeRelabelConfigs:
      - action: keep
        regex: django_http_requests_latency_seconds_by_view_method_.*
        sourceLabels:
        - __name__
      - action: drop
        regex: (.{0})
        sourceLabels:
        - kubernetes_namespace
    - name: bdsm-prod
      url: 
      writeRelabelConfigs:
      - action: keep
        regex: onec_business_transaction_duration_seconds_by_key_operation_.*
        sourceLabels:
        - __name__
      - action: replace
        regex: onec_business_transaction_duration_seconds_by_key_operation_(.*)
        replacement: onec_by_key_operation_${1}
        sourceLabels:
        - __name__
        targetLabel: __name__
      - action: drop
        regex: (.{0})
        sourceLabels:
        - kubernetes_namespace
    replicas: 1
    retention: 10d
    routePrefix: /
    ruleNamespaceSelector: {}
    ruleSelector:
      matchLabels:
        app: kube-prometheus-stack
        release: prometheus
    scrapeInterval: 30s
    securityContext:
      fsGroup: 2000
      runAsGroup: 2000
      runAsNonRoot: true
      runAsUser: 1000
    serviceAccountName: prometheus-kube-prometheus-prometheus
    serviceMonitorNamespaceSelector: {}
    serviceMonitorSelector:
      matchLabels:
        release: prometheus
    shards: 1
    storage:
      volumeClaimTemplate:
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 50Gi
          storageClassName: bim-store-ssd
    version: v2.31.1
  status:
    availableReplicas: 0
    conditions:
    - lastTransitionTime: "2023-04-12T12:31:15Z"
      message: 'shard 0: pod prometheus-prometheus-kube-prometheus-prometheus-0: containers
        with unready status: [prometheus]'
      reason: NoPodReady
      status: "False"
      type: Available
    - lastTransitionTime: "2023-04-12T11:15:23Z"
      status: "True"
      type: Reconciled
    paused: false
    replicas: 1
    shardStatuses:
    - availableReplicas: 0
      replicas: 1
      shardID: "0"
      unavailableReplicas: 1
      updatedReplicas: 1
    unavailableReplicas: 1
    updatedReplicas: 1
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Here is my kafka resource which is deployed in kafka namespace:

apiVersion: v1
items:
- apiVersion: kafka.strimzi.io/v1beta2
  kind: Kafka
  metadata:
    annotations:
      meta.helm.sh/release-name: bus
      meta.helm.sh/release-namespace: kafka
    creationTimestamp: "2021-12-02T06:14:51Z"
    generation: 20
    name: bus-kafka-instance
    namespace: kafka
    resourceVersion: "464750838"
    uid: 65c34e05-7686-402e-b291-2553cce17741
  spec:
    entityOperator:
      topicOperator: {}
      userOperator: {}
    kafka:
      config:
        auto.create.topics.enable: "true"
        message.max.bytes: 10485880
        offsets.topic.replication.factor: 2
        transaction.state.log.min.isr: 2
        transaction.state.log.replication.factor: 2
      jmxOptions: {}
      listeners:
      - name: plain
        port: 9092
        tls: false
        type: internal
      - configuration:
          bootstrap:
            nodePort: 31081
        name: external
        port: 9094
        tls: false
        type: nodeport
      metricsConfig:
        type: jmxPrometheusExporter
        valueFrom:
          configMapKeyRef:
            key: kafka-metrics-config.yml
            name: kafka-metrics
      replicas: 3
      resources:
        limits:
          cpu: 2
          memory: 4Gi
        requests:
          cpu: 1
          memory: 1Gi
      storage:
        class: bim-store-ssd
        deleteClaim: false
        size: 100Gi
        type: persistent-claim
      template:
        pod:
          affinity:
            nodeAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: bim-node-type-kafka
                    operator: In
                    values:
                    - kafka-strimzi-node
            podAntiAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
              - podAffinityTerm:
                  labelSelector:
                    matchExpressions:
                    - key: strimzi.io/name
                      operator: In
                      values:
                      - bus-kafka-instance-kafka
                  topologyKey: kubernetes.io/hostname
                weight: 100
    kafkaExporter:
      enableSaramaLogging: true
      groupRegex: .*
      logging: info
      resources:
        limits:
          cpu: 500m
          memory: 128Mi
        requests:
          cpu: 200m
          memory: 64Mi
      template:
        pod:
          metadata:
            annotations:
              prometheus.io/scrape: "true"
      topicRegex: .*
    zookeeper:
      jmxOptions: {}
      metricsConfig:
        type: jmxPrometheusExporter
        valueFrom:
          configMapKeyRef:
            key: zookeeper-metrics-config.yml
            name: kafka-metrics
      replicas: 3
      storage:
        class: bim-store-ssd
        deleteClaim: false
        size: 20Gi
        type: persistent-claim
  status:
    clusterId: eOqIOXuEStuhlEdUwty-AA
    conditions:
    - lastTransitionTime: "2023-04-18T09:13:41.820Z"
      status: "True"
      type: Ready
    listeners:
    - addresses:
      - host: bus-kafka-instance-kafka-bootstrap.kafka.svc
        port: 9092
      bootstrapServers: bus-kafka-instance-kafka-bootstrap.kafka.svc:9092
      type: plain
    - addresses:
      - host: 10.20.20.185
        port: 31081
      - host: 10.20.30.200
        port: 31081
      - host: 10.20.28.156
        port: 31081
      bootstrapServers: 10.20.20.185:31081,10.20.30.200:31081,10.20.28.156:31081
      type: external
    observedGeneration: 20
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Edit: 19.04.2023

The problem was in reduntant label in podMonitor selectors. Now podMonitor is displayed in service discover pod all target lables are dropped. Any idea why? Prometheus screenshow

Please share your exact yaml files rather than link to examples, then — OneCricketeer, Apr 17 '23 at 12:31
What did you change it to? What namespace is everything running in? Please share your exact yaml files... Or just use the resources that come with Strimzi / their Helm Charts. You don't need to re-install the promethus operator CRDs — OneCricketeer, Apr 17 '23 at 12:32
Hi @OneCricketeer. Thank you for you reply. Shared the yaml resources in the post — Магомед Якубов, Apr 18 '23 at 13:23

OneCricketeer · Answer 1 · 2023-04-18T21:45:49.007

First, it appears Prometheus isn't healthy. Use kubectl describe / logs to get more information.

  status:
    availableReplicas: 0
    conditions:
    - lastTransitionTime: "2023-04-12T12:31:15Z"
      message: 'shard 0: pod prometheus-prometheus-kube-prometheus-prometheus-0: containers
        with unready status: [prometheus]'
      reason: NoPodReady

Look at selectors to see why nothing is found. For example, you have several that do not match resources that you've shared (entity-operator, KafkaBridge, etc). Are those actually deployed names?

For the PodMonitor used for Kafka, it needs to match this specific expression

  selector:
    matchExpressions:
      - key: "strimzi.io/kind"
        operator: In
        values: ["Kafka", "KafkaConnect", "KafkaMirrorMaker", "KafkaMirrorMaker2"]
  namespaceSelector:
    matchNames:
      - kafka

So, that seems to want to match this section

items:
- apiVersion: kafka.strimzi.io/v1beta2
  kind: Kafka
  metadata:
    ...
    namespace: kafka

But I don't know if kafka.strimzi.io/v1beta2 + kind: Kafka is the same as key: "strimzi.io/kind" matcher... Plus, I have never seen anyone use kind: List for just one deployed resource.

You could replace the key by some other label.

Plus, you have

  podMetricsEndpoints:
  - path: /metrics
    port: tcp-prometheus

But this path/port name is not defined anywhere on your shown Kafka resource.

Then, you also have this annotation, but it is not set in any of your PodMonitor resources.

template:
  pod:
    metadata:
      annotations:
        prometheus.io/scrape: "true"

And a few more configs on Prometheus itself that are empty... but this could mean to inspect all namespaces, or only local ones.

Otherwise, this would mean to select none.

    podMonitorNamespaceSelector: {}
    probeNamespaceSelector: {}

The problem was in reduntant label in podMonitor selectors. Now podMonitor is displayed in service discover pod all target lables are dropped. Any idea why? I added screenshot to the post — Магомед Якубов, Apr 19 '23 at 02:05
Not sure I understand the question. Your image only shows a `kafka-ui` labelled pod target, which is not the broker itself. I suggest running `kubectl get pods`. Then describe one-by-one and looking at any labels/annotations, then compare with your monitors — OneCricketeer, Apr 19 '23 at 16:49

Prometheus doesn't discover podMonitor

1 Answers1