I recently came into managing a small startup. As most small startups, I would think, we have been doing what we wanted in production virtually when we thought it was okay. People are careful and things have worked very well. We have also been able to resolve things very quickly which the clients are very grateful for.
However yesterday we had an issue where an admin, on their own, decided to change a server name and update software to get it more in line with things. The devs were notified however the name change killed our message queue system which in turn basically shut us down for hours. From this there was a series of cascading failures and the VM hosting the message queue actually had to be killed and a new VM created. No one was pleased.
This should have been verified in a non production environment first.
I was wondering what maintenance is allowed in production during business critical times? Some I would imagine however how much?