I collect metrics from systems with node exporter or from applications' own endpoints. Some of these are ever-increasing metrics like sales volume (counter), and some are metrics that change (gauge) like CPU load. I collect metrics with 400 different names from about 100 VMs.
There are several ways to keep metrics for a long time, but I'm sending them to InfluxDB via Telegraf with remote_write. Of course, this also unnecessarily loads InfluxDB. My goal is to be able to see the summaries on a daily basis when I go back 2 years later. So I don't need to keep 5 minutes of data. For example, if I get a summary every 6 hours, it is enough. I can get the final values of the counter type metrics but I need to average the gauge type metrics. Do I need to create rules for 200 different metrics for this? How do I design such a system?
Does the same difficulty apply if I use Cortex, Thanos or Mimir and not InfluxDB? In the end, I think it would be enough if I did it as a summary rather than keeping instant data for a long time. However, summing up the metrics and summarizing them with the rate function is not a correct solution either.