We're looking for short checklist/template for to show that the change has actually worked and hasn't broken anything. This is being introduced as part of a more formal change management process. Anyone got any experience of doing this and what worked.
1 Answers
Manual change verification is for Lesser Sysadmins. Real BOFHs Automate.
All of our systems are comprehensively (and I do mean comprehensively; down to little niggly details like "are all the NICs on this system currently running at Gigabit speed"), and all of our changes end with "ensure that monitoring is clear at time X" (where "time X" is "end of maintenance window less the estimated rollback time plus a fudge factor because everything takes longer than you expected, even rollbacks").
If for some reason the part of our system that's being changed isn't already comprehensively monitored, step one of the change plan is "improve monitoring" (with a detailed list of what needs to monitored, how, why, and details of the ways in which the monitoring response documentation needs to be improved to match).
The benefits of this are severalfold:
- We don't have to spend time checking things by hand, because everything's constantly being monitored for us
- There's no chance that someone will either make a mistake on the verification, or speak a mistruth about whether they tested everything
- All of this monitoring to ensure changes didn't break anything also make sure that we know about problems during operation -- anything that can break during a change can almost certainly break on a day to day basis, so it's good to know about those sorts of things all the time.
A simple plan for getting from the start line to fully-monitored utopia is simply to setup a monitoring infrastructure, and then for each change plan make the first step "setup monitoring for the services I'm going to change". Setting that up doesn't take much longer than writing and executing a comprehensive test plan anyway, and the benefits are long-term (that monitoring is constant and forever, and the next time you have to change something there you save the time of writing and executing another test plan).

- 96,255
- 29
- 175
- 230
-
1Is the beer free and all the people beautiful where you live as well. In all seriousness what you're talking about is an end game that is 5-10 years away for us, we need some baby steps first. – MrTelly Dec 21 '09 at 22:37
-
1The beer is free here, as it happens, but being sysadmins we're not the most attractive bunch of people. – womble Dec 21 '09 at 22:43