We are about to setup Prometheus for monitoring and alerting for our cloud services including a continous integration & deployment pipeline for the Prometheus service and configuration like alerting rules / thresholds. For that I am thinking about 3 categories I want to write automated tests for:
- Basic syntax checks for configuration during deployment (we already do this with promtool and amtool)
- Tests for alert rules (what leads to alerts) during deployment
- Tests for alert routing (who gets alerted about what) during deployment
- Recurring check if the alerting system is working properly in production
Most important part to me right now is testing the alert rules (category 1) but I have found no tooling to do that. I could imagine setting up a Prometheus instance during deployment, feeding it with some metric samples (worrying how would I do that with the Pull-architecture of Prometheus?) and then running queries against it.
The only thing I found so far is a blog post about monitoring the Prometheus Alertmanager chain as a whole related to the third category.
Has anyone done something like that or is there anything I missed?