Prometheus + Longhorn = wrong volume size

Question

I am not really sure, if this is a prometheus issue, or just Longhorn, or maybe a combination of the two.

Setup:

Kubernetes K3s v1.21.9+k3s1
Rancher Longhorn Storage Provider 1.2.2
Prometheus Helm Chart 32.2.1 and image: quay.io/prometheus/prometheus:v2.33.1

Problem:

Infinitely growing PV in Longhorn, even over the defined max size. Currently using 75G on a 50G volume.

Description:

I have a really small 3 node cluster with not too many deployments running. Currently only one "real" application and the rest is just kubernetes system stuff so far.
Apart from etcd, I am using all the default scraping rules.
The PV is filling up a bit more than 1 GB per day, which seems fine to me.

The problem is, that for whatever reason, the data used inside longhorn is infinitely growing. I have configured retention rules for the helm chart with a retention: 7d and retentionSize: 25GB, so the retentionSize should never be reached anyway.
When I log into the containers shell and do a du -sh in /prometheus, it shows ~8.7GB being used, which looks good to me as well.
The problem is that when I look at the longhorn UI, the used spaced is growing all the time. The PV does exist now for ~20 days and is currently using almost 75GB of a defined max of 50GB. When I take a look at the Kubernetes node itself and inspect the folder, which longhorn uses to store its PV data, I see the same values of space being used as in the Longhorn UI, while inside the prometheus container, everything looks good to me.

I hope someone has an idea what the problem could be. I have not experienced this issue with any other deployment so far, all others are good and really decrease in size used, when something inside the container gets deleted.

score 0 · Answer 1 · answered Jun 13 '23 at 12:33

Can the snapshots be the reason for the increasing size? As I understand it, longhorn takes snapshots and they are added to the total actual size used on the node, if data in the snapshot is different to the current data in the volume, which happens in your case because old metrics are deleted and new ones are received.

See this comment and this one.
Know I'm answering late but came across the same issues and maybe it helps someone.

Prometheus + Longhorn = wrong volume size

Setup:

Problem:

Description:

1 Answers1