Our team decided to switch to Prometheus monitoring. So I wonder how to setup highly available fault tolerant Prometheus installation. We have a bunch of small projects, running on AWS ECS, almost all services are containerized. So I have some questions.
Should we containerize the Prometheus?
That means to run 2 EC2 instances with one Prometheus container per instance and one NodeExporter per instance. And run highly available Alert Manager in the container with Wave Mesh per instance in separate instances.
Or just install Prometheus binary and other stuff on EC2 and forget about containerizing them?
Any ideas? Are some best practices exist for highly available Prometheus setup?