At the moment we are using https://github.com/SumoLogic/sumologic-kubernetes-collection to collect metrics and logs from our EKS clusters and send them to Sumo backend. The problem is that we pay quite heavily for our 60GB of logs per day and ~ 70k dpm (data points per minute) in case of metrics. When we started with 2 EKS clusters the problem was not visible but at the moment we have to scrape metrics every 4 minutes on non prod clusters and every 50s on production clusters to be below the metrics dpm limit and not generate additional costs.
I am thinking how to address this issue, we already trimmed the list of metrics as far as we could and we are not able to pay more for additional credits. I am thinking about alternatives and CloudWatch comes to mind as we already use CW metrics for other parts of our stack (RDS, SQS, ALB, Lambda, etc.).
Does anyone have experience with setting EKS metrics monitoring with CloudWatch? I would like to have the option to:
- get metrics related to pod runtime (pod status, cpu requests/limits, pod restarts, etc)
- have the option to send custom metrics to CW (for example jvm metrics fetched through jmx or pulsar metrics fetched through Prometheus /metric endpoint)
- get metrics related to EKS workers and masters (status, cpu/memory usage, etc)
- send slack and PagerDuty notifications
Is this kind of setup possible with CloudWatch and maybe some Prometheus middleman instance which will temporarily gather custom metrics (eg. jvm, pulsar, etc) from "external"/non AWS sources and send them to CW?
Thanks in advance for all help!
Best Regards, Rafal.