5

We have several K8S clusters which we need to monitor from one operator cluster (cluster A) We are using Prometheus on each cluster to monitor the cluster itself, now in addition we want to monitor from a specific api of application which will tell us if our cluster (according to our specific services) is functinal or not, im not talking about monitor the cluster ,we want the the operator will monitor 3 application on each cluster( all the 3 applications are deployed on all the monitored clusters)

Cluster A (operator) should monitor service/apps on cluster B,C,D etc

e.g. The operator cluster will call to deplyed app in clusterA like host://app1/status to get the status if 0 or 1, and save the status in some DB. (maybe prometehusDB) and report them outside the cluster.

Currently after some search I found this option but maybe there is more which I dont khow

  1. Use blackbox exporter - https://github.com/prometheus/blackbox_exporter

  2. Create my own programs (in golang) which will like a cronjob and which will be runing in the operator cluster using prom lib.

https://github.com/prometheus/client_golang

I mean running a rest call and use Prometheus api to store the status inside Prometheus tsdb via go "github.com/prometheus/client_golang/prometheus/promhttp" code. but not sure how..

  1. Federation ??

In addition, in case I was able to collect all the data from the clusters into the operator cluster, How and where should I keep it? in Prometheus db tsdb? other way ?

What should be the best practice to support our case ? How should we do it ?

Beno Odr
  • 1,123
  • 1
  • 13
  • 27
  • 1
    Is there a reason to not use normal Prometheus federation here? – coderanger Jul 06 '20 at 06:23
  • @coderanger - thanks for replay, we consider it already and we want to use Thanos however since we have some obstetricals to send the data to on-prem system we will do it at later time, Now we need to use some internal monitoring system, what you suggest in case for both options which I mentioned ? – Beno Odr Jul 06 '20 at 06:52

2 Answers2

1

I have seen that you though about using Thanos, its not bad and we had it running in production, for a while. But it didn't fit well for our requirements, yours looks familiar to ours so I suggest you to take a look at VictoriaMetrics you have a nice article just here : https://medium.com/faun/comparing-thanos-to-victoriametrics-cluster-b193bea1683

Also a big up is their support on Slack! Good luck implementing it!

Mark Davydov
  • 337
  • 4
  • 18
1

Ideally you would instrument your code and expose Prometheus compatible metrics for whatever needs monitored. But, there is something to be said for blackbox and/or 3rd party monitoring/smoke testing.

The http module in Blackbox Exporter is probably what you want (I've used it similarly before). If that isn't flexible enough for the testing you need to do, I like to run custom testing scripts in Lambda that record the results in Cloudwatch (if running in AWS, otherwise use the equivalent in your environment). If you haven't done that before, there is a bit of a learning curve, but it is well worth the effort.

If the APIs are externally accessible, services like Pingdom and Site24x7 offer flexible testing options (for a price), and it is generally recommended to utilize a 3rd party for at least basic up-time testing for the cases where your entire environment goes down--along with all of your monitoring!

But, it does sound like you just want to do some basic blackbox style monitoring which the Blackbox Exporter would be well suited to. It will need a host to run on, and then you'll need to add a job for it to Prometheus' scrape config. Best practice is to use each host for a single purpose, so I'd provision a specific host for the purpose of running blackbox exporter (even if it is just another container in the cluster).