I have been given the responsibility to manage a team of 4 system admins. They are managing 70+ servers. They don't yet have written processes/procedures/practices. I don't much about system administration. Is there a standard which we can follow to standardize our work or choose best practices?
-
8How is it that you got this position without knowing anything about system administration? Granted I suppose not all IT managers are IT people but it would certainly help if you were. – joeqwerty Mar 17 '11 at 14:09
-
I would imagine this is a pretty common situation - an organizational shuffle leads to an 'abandoned' team of sysadmins that someone needs to manage. Something like this has happened to me at least once. – Phil Hollenback Mar 17 '11 at 20:15
-
This happens when a lead resign and there is no suitable successor available within the team. I am also managing a development team and this is an additional responsibility. – newbie Mar 21 '11 at 11:13
8 Answers
I'd endorse what others have said about not jumping in and laying down the law. You say the team, right now, is managing 70+ servers, so my first question is: how well are they doing? Is there lots of unscheduled downtime, working-day outages, constant scrambling to fix stuff just before it explodes? Or are they doing a pretty good job from a service-delivery standpoint, with only the occasional unforeseeable disaster of the sort that happens to us all to mar the peace?
If it's the latter, then you've got yourself a good team which seems to know what it's doing, and not trying to fix what isn't broken is an important part of not putting your team's backs up.
If it's the former, you may still have a good team; good teams can flounder because of a lack of support and engagement from the business (no budget for new kit, no agreement on compensation for the midnight work that would be required to upgrade things without working-day outages, no clear agreement on SLAs), or internal frictions, or a host of other non-technical reasons.
If it's the former, of course, you may just have an inadequate team.
The right response varies wildly across these three scenarios, and will also be affected by the personalities involved.
If you have a good team, working well, then let them lead you. What they're doing is right, but you need to understand what it is that they do, and how. They'll tell you, if you ask, and if you ask nicely they'll probably tell you in the most useful way, by writing it all down. Annual reviews and agreed-on goals are a good way of inserting more documentation into the working sysadmin's life. Essentially, what they're doing now is close to best-practise, so try to get them to document it in a mutually-useful way, rather than imposing anything on them.
If you have a good team working badly, they probably know what needs to change in order to become a good team working well. Listen to them, and work out how to convert their needs into justified requirements to be passed back to the business. You can add a lot of value as the bridge between tech world and business world, if you're prepared to listen to both sides, and say "no" to both sides in appropriate measure.
If you have a bad team working badly, then you have your work cut out for you. Identifying and documenting what's going wrong will be important in being able to discipline, and if necessary, replace people without exposing the business to liability. Identifying low-hanging fruit - things that could be easily nudged into going well - is important in getting some quick team-motivational and business-credibility wins, and baselining what's wrong is helpful here in being able to show that some quick improvements have been made.
I see I have rambled off-track somewhat, but I honestly believe that best-practice and standardisation exist to satisfy the needs of the business and the people to get the job done, rather than being some ivory pinnacle of documentation excellence standing alone in a vacuum, so my answer reflects my interconnected approach. I'm sorry if it's overlong!

- 79,770
- 20
- 184
- 232
-
I got a good team. Thanks for the detailed answer and guiding in the right direction. – newbie Mar 21 '11 at 11:35
Consider starting with ITIL: http://en.wikipedia.org/wiki/Information_Technology_Infrastructure_Library
ITIL gives detailed descriptions of a number of important IT practices and provides comprehensive checklists, tasks and procedures that any IT organisation can tailor to its needs.
Don't expect to read an ITIL book and know everything but it is a good place to start. Jumping in after reading ITIL and telling the sys admins "the new law" might get you some unhappy sys admins.
What I would suggest is sitting them down and discussing with them how best to improve the documentation, and how to cover time tracking/etc.

- 3,040
- 1
- 19
- 23
You might want to start with The Practice of System and Network Administration, Second Edition by Thomas A. Limoncelli, Christina J. Hogan, and Strata R. Chalup. There are some really great best practices outlined which will help you and your team on the right path. It's quite down-to-earth, and an easy read despite its length.
ITIL is a good thing to keep in mind, but it's very easy for people new to ITIL to hamstring themselves trying to implement everything it describes to the letter. Use what you need, keep in mind what you may need later, but don't let it keep you from doing the job your customers expect.

- 25,209
- 6
- 44
- 67
-
The most important thing to remember about ITIL is **adopt and adapt** make it work for you not get in your way. Unfortunately far too many ITIL implementations get in the way. – user9517 Mar 17 '11 at 14:53
The other answers give some specific practical advice about things like ITIL, which I think is good. However, bear in mind that a lot of the standards really come down to doing the sensible thing and doing it in a repeatable way. You need to manage your servers consistently with configuration management tools such as Puppet or Chef. You need to track as many metrics as possible, and be as transparent as possible with your users. If you generally think about the big picture of providing a quality service and keeping your customers or users happy, you will do just fine. The fact that you are thinking about standards is a good sign.
One book I recently read about a lot of these topics was Web Operations. It has some good advice on how to do things like manage incident postmortems and how to gather metrics. Recommended.

- 14,947
- 4
- 35
- 52
As a sysadmin, I'd recommend you to focus your team on:
- Documenting - not only systems descriptions, but also changes log, documenting of all custom tools and so on
- Servers & services monitoring
- Automated deployment/configuration
This 3 aspects should make you team productive, your team members replaceble.

- 4,125
- 1
- 27
- 31
ITIL and COBIT is the leader standards. Out company works with ITILv3, but there was an IT monitoring based on COBIT too.
It worth a quick look too : http://en.wikipedia.org/wiki/COBIT

- 621
- 2
- 7
- 12
Number one thing to quiz your team about on is backup and recovery - make sure that's covered. As Tom Kyte says in relation to Database Administation, backup and recovery is the one thing you cannot afford to get wrong. Review that first, document it, especially any risks and the level of service you can commit to and plug any gaps between reality and business expectation.

- 1
Get feedback from the sysadmins (and possibly even developers) as to how processes could be improved. They are your number one source of information and will know the problems and bottlenecks better than anyone else.
Make sure your documentation process is up to scratch and automate as much of it as possible. People always forget to add notes to wikis about deployments and upgrades. Consider writing a server dashboard that allows them to quickly check how all the servers are running and what versions of software are installed on various boxes.
Automate, automate and automate (and document all the automations).

- 473
- 5
- 7