Production grade methodology for alerts

Question

Background

Our code is written with:

Unit tests
End to end tests
Code review
Staging process
Deployment process

On the contrary, our alerts are just written and then modified occasionally manually. No quality process at all.

This process is reasonable for simple threshold checks. However, our alerts are sometimes built on complicated queries. Sometimes composed of ~20 lines of a query.

If we accidentally break an alert, it could expose us to production instability since we won't know if some logic or component breaks.

The question

Is there a recommended methodology for validating the quality of complicated alerts?

P.S.

We're using Splunk alerts

score 2 · Accepted Answer · answered May 04 '20 at 11:30

Splunk does not have a documented practice for validating alerts, if that's what you are looking for. I suggest you follow a practice similar to that which you use for code. Unit testing is not possible, but you can test modified alerts on a non-production system using either a sample of production data or with synthesized data.

Production grade methodology for alerts

Background

The question

P.S.

1 Answers1