I'm trying to configure monitoring for strimzi Apache Kafka clusters. The Prometheus operator stack were already deployed using some helm chart (not using resources provided by strimzi). Kafka and zookeeper cluster are also deployed using strimzi. So in order to configure monitoring I did the following things:
- Enabled metrics as in this example https://github.com/strimzi/strimzi-kafka-operator/blob/0.34.0/examples/metrics/kafka-metrics.yaml
- Deployed prometheus rules provided by strimzi https://github.com/strimzi/strimzi-kafka-operator/blob/0.34.0/examples/metrics/prometheus-install/prometheus-rules.yaml
- Deployed pod monitor using https://github.com/strimzi/strimzi-kafka-operator/blob/0.34.0/examples/metrics/prometheus-install/strimzi-pod-monitor.yaml (replaced
namespaceSelector.matchNames
with the namespace name were my kafka resources are deployed) - Edited prometheus resource (
k edit prometheus <name>
) to add topodMonitorSelector.matchLabels
labels of my podMonitor.
But podMonitor is not displayed in service discovery in Prometheus. Did I missed something? Do you have any idea why it is not showing up?
My prometheus resources are located in namespace monitoring, here its (prometheus resource) yaml manifest:
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
annotations:
meta.helm.sh/release-name: prometheus
meta.helm.sh/release-namespace: monitoring
creationTimestamp: "2021-06-28T05:38:09Z"
generation: 45
labels:
app: strimzi
app.kubernetes.io/instance: prometheus
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: kube-prometheus-stack
app.kubernetes.io/version: 20.1.0
chart: kube-prometheus-stack-20.1.0
heritage: Helm
release: prometheus
name: prometheus-kube-prometheus-prometheus
namespace: monitoring
resourceVersion: "464711341"
uid: 67fbbadc-0ecc-4f8a-9f7c-ba51efecb38e
spec:
additionalScrapeConfigs:
key: additional-scrape-configs.yaml
name: prometheus-kube-prometheus-prometheus-scrape-confg
alerting:
alertmanagers:
- apiVersion: v2
name: prometheus-kube-prometheus-alertmanager
namespace: monitoring
pathPrefix: /
port: web
enableAdminAPI: false
evaluationInterval: 30s
externalUrl:
image: quay.io/prometheus/prometheus:v2.31.1
listenLocal: false
logFormat: logfmt
logLevel: info
paused: false
podMonitorNamespaceSelector: {}
podMonitorSelector:
matchLabels:
app: strimzi
release: prometheus
portName: web
probeNamespaceSelector: {}
probeSelector:
matchLabels:
release: prometheus
remoteWrite:
- name: bdsm
url: http://promscale-connector.bdsm:9201/write
writeRelabelConfigs:
- action: keep
regex: django_http_requests_latency_seconds_by_view_method_.*
sourceLabels:
- __name__
- action: drop
regex: (.{0})
sourceLabels:
- kubernetes_namespace
- name: bdsm-prod
url:
writeRelabelConfigs:
- action: keep
regex: onec_business_transaction_duration_seconds_by_key_operation_.*
sourceLabels:
- __name__
- action: replace
regex: onec_business_transaction_duration_seconds_by_key_operation_(.*)
replacement: onec_by_key_operation_${1}
sourceLabels:
- __name__
targetLabel: __name__
- action: drop
regex: (.{0})
sourceLabels:
- kubernetes_namespace
replicas: 1
retention: 10d
routePrefix: /
ruleNamespaceSelector: {}
ruleSelector:
matchLabels:
app: kube-prometheus-stack
release: prometheus
scrapeInterval: 30s
securityContext:
fsGroup: 2000
runAsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-kube-prometheus-prometheus
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector:
matchLabels:
release: prometheus
shards: 1
storage:
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: bim-store-ssd
version: v2.31.1
status:
availableReplicas: 0
conditions:
- lastTransitionTime: "2023-04-12T12:31:15Z"
message: 'shard 0: pod prometheus-prometheus-kube-prometheus-prometheus-0: containers
with unready status: [prometheus]'
reason: NoPodReady
status: "False"
type: Available
- lastTransitionTime: "2023-04-12T11:15:23Z"
status: "True"
type: Reconciled
paused: false
replicas: 1
shardStatuses:
- availableReplicas: 0
replicas: 1
shardID: "0"
unavailableReplicas: 1
updatedReplicas: 1
unavailableReplicas: 1
updatedReplicas: 1
kind: List
metadata:
resourceVersion: ""
selfLink: ""
Here is my kafka resource which is deployed in kafka namespace:
apiVersion: v1
items:
- apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
annotations:
meta.helm.sh/release-name: bus
meta.helm.sh/release-namespace: kafka
creationTimestamp: "2021-12-02T06:14:51Z"
generation: 20
name: bus-kafka-instance
namespace: kafka
resourceVersion: "464750838"
uid: 65c34e05-7686-402e-b291-2553cce17741
spec:
entityOperator:
topicOperator: {}
userOperator: {}
kafka:
config:
auto.create.topics.enable: "true"
message.max.bytes: 10485880
offsets.topic.replication.factor: 2
transaction.state.log.min.isr: 2
transaction.state.log.replication.factor: 2
jmxOptions: {}
listeners:
- name: plain
port: 9092
tls: false
type: internal
- configuration:
bootstrap:
nodePort: 31081
name: external
port: 9094
tls: false
type: nodeport
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
key: kafka-metrics-config.yml
name: kafka-metrics
replicas: 3
resources:
limits:
cpu: 2
memory: 4Gi
requests:
cpu: 1
memory: 1Gi
storage:
class: bim-store-ssd
deleteClaim: false
size: 100Gi
type: persistent-claim
template:
pod:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: bim-node-type-kafka
operator: In
values:
- kafka-strimzi-node
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: strimzi.io/name
operator: In
values:
- bus-kafka-instance-kafka
topologyKey: kubernetes.io/hostname
weight: 100
kafkaExporter:
enableSaramaLogging: true
groupRegex: .*
logging: info
resources:
limits:
cpu: 500m
memory: 128Mi
requests:
cpu: 200m
memory: 64Mi
template:
pod:
metadata:
annotations:
prometheus.io/scrape: "true"
topicRegex: .*
zookeeper:
jmxOptions: {}
metricsConfig:
type: jmxPrometheusExporter
valueFrom:
configMapKeyRef:
key: zookeeper-metrics-config.yml
name: kafka-metrics
replicas: 3
storage:
class: bim-store-ssd
deleteClaim: false
size: 20Gi
type: persistent-claim
status:
clusterId: eOqIOXuEStuhlEdUwty-AA
conditions:
- lastTransitionTime: "2023-04-18T09:13:41.820Z"
status: "True"
type: Ready
listeners:
- addresses:
- host: bus-kafka-instance-kafka-bootstrap.kafka.svc
port: 9092
bootstrapServers: bus-kafka-instance-kafka-bootstrap.kafka.svc:9092
type: plain
- addresses:
- host: 10.20.20.185
port: 31081
- host: 10.20.30.200
port: 31081
- host: 10.20.28.156
port: 31081
bootstrapServers: 10.20.20.185:31081,10.20.30.200:31081,10.20.28.156:31081
type: external
observedGeneration: 20
kind: List
metadata:
resourceVersion: ""
selfLink: ""
Edit: 19.04.2023
The problem was in reduntant label in podMonitor selectors. Now podMonitor is displayed in service discover pod all target lables are dropped. Any idea why? Prometheus screenshow