I have a monitoring solution that uses Prometheus as a scraper and a data storage, Grafana as a visualiser and Alertmanager as an alerting tool. This all is running on a single server.
However, there's an issue with this approach. If a server that is hosting all of this goes down, I basically lose all the monitorings, so in case something would crash after that I would never know.
I assume best way to handle that would be to have 2 servers, so they somehow share the same information, and I would be notified that a node in this setup is down. However, how should I set up Prometheus and Grafana so they won't be a single point of failure?
As far as I know I can set up an Alertmanager cluster but that won't solve the issue when a single instance of Prometheus is down, so I'll have to replicate it as well somehow.