Prometheus deployed on kubernetes using prometheus operator is eating too much memory and it is at present at ~12G. I see /prometheus/wal
directory is at ~12G. I have removed all *.tmp
files but that couldn't help.
Unable to figure out the solution for this problem. Any suggestions ??

- 5,605
- 8
- 44
- 59
2 Answers
Reduce your retention time or reduce your number of time series.

- 52,400
- 4
- 52
- 75
-
retention period is 30 days, but how it is related to memory ? – Yogesh Jilhawar May 06 '20 at 08:59
-
how to reduce number of time series? – Yogesh Jilhawar May 06 '20 at 08:59
Digging over google couple of days, I found that there are lot of unused metrics for which we can drop samples[1].
Searched over prometheus directory for the use of metrices whose sampling is very high-
Query to find prometheus metrics-
topk(20, count by (__name__, job)({__name__=~".+"}))
If you found tcp or udp metrices here in the list. Try executing these metrices on prometheus, if value is zero then these are safe to drop since these metrices are already disabled at CAdvisor level as they produce large number of samples.
bash commands to check for the use of these metrices anywhere in prometheus or grafana-
cd <prometheus dir>
grep -irn <metric_name>
if not being used anywhere then simply add drop action for that particular job.
Note- you will get jobname
in promQL query executed at first.
I am using prometheus operator so I have to edit respective servicemonitor definition for that. If you are deploying prometheus in normal way, you may need to edit prometheus.yaml file.
metric_relabel_configs:
- source_labels: [ __name__ ]
regex: 'metric_name'
action: drop
Reference:-.
[1] https://www.robustperception.io/dropping-metrics-at-scrape-time-with-prometheus
[2] Formula to calculate needed RAM-
https://www.robustperception.io/how-much-ram-does-prometheus-2-x-need-for-cardinality-and-ingestion

- 5,605
- 8
- 44
- 59