We have configured the metrics for application in Prometheus and grafana I am getting alerts intermittently and that too for shorter duration in which I am unable to capture the Error which caused the metrics to go down. In the meanwhile if I check in Prometheus when the alert comes it used to be fine and all services will be up and running. So I am unable to see the exact error what is causing the system to go down so how can I implement a script to capture that error from Prometheus which all parameters I need to include for that script.
Asked
Active
Viewed 355 times