5

Recently we added a couple web service machines, and they couldn't successfully email out. We (IS) did not notice this, and the exceptions were being swallowed up and logged, but no-one noticed for about a month.

Needless to say, many purchase orders, and retraction of purchase orders, were never sent out for the past month.

While this isn't any one person's fault really, is there any GOOD way to break this to someone non-technical that is higher up in the company than you?

Thanks in advance for any advice, I'm freaking out just a bit. :)


Edit: Reading this over, I'm more asking for tips on how to break the news. I understand there isn't a GOOD way, just maybe successful tips that have worked for you in the past.


Resolution in case anybody was wondering... new web service machines IPs weren't added to our mail servers list of trusted IPs. :)

skaffman
  • 398,947
  • 96
  • 818
  • 769
Dano
  • 677
  • 2
  • 17
  • 30
  • 1
    Not programming related - perhaps can go on the IT version of stackoverflow once it makes its appearance... – Adam Davis Feb 24 '09 at 03:47
  • "The offending tech has been sacked." Seriously. This is not a small, trivial mistake. – Adam Davis Feb 24 '09 at 03:48
  • @Adam It makes no sense to fire someone based solely on the severity of a technical mistake. That's just retribution to make someone feel better, inflicting pain and punishment. Completely irrational and emotion-driven. Firing is for when its a clear, un-fixable misfit between company and employee. – Rex M Feb 24 '09 at 03:53
  • @Rex - it's situation dependent, of course. However, if the tech's are supposed to know how to set up a server and test it, and they failed causing 1 dollar or over 30 million dollars in 'damage' then yes - they failed at their job and they should be considered for firing. – Adam Davis Feb 24 '09 at 04:08
  • Severity may not have anything to do with it, but as I said this type of mistake in a business critical system when they should have tested it thoroughly before betting the business on it is unacceptable regardless of loss. – Adam Davis Feb 24 '09 at 04:09
  • @Rex, if enough money was lost -- not counting the long-term damage done to my business's reputation -- I would fire the tech in a heartbeat. And I'd increase the tech budget so that I could hire a replacement with more experience, who knows how to do proper testing. – Matt Howell Feb 24 '09 at 04:35
  • @Adam, I wouldn't recommend firing for this kind of mistake. Techs are only human too; we make bugs in software and yet we don't get fired even for pretty major ones. I do recommend discipline and perhaps more training if it's warranted though. – Adam Hawes Feb 24 '09 at 04:52
  • @Adam I'm glad you're not my boss. Even taking an extra day to do something could be considered a "damage" by some people. If you immediately fire everyone who cost the company money, you won't have very many employees. Get procedures in place to prevent this type of failure in the future. – Knobloch Feb 24 '09 at 15:51
  • @bigmattyh that's a very expensive way to run an organization. if that was the tech's only noteworthy mistake, it's much cheaper to invest in training and positive motivation to improve. – Rex M Feb 24 '09 at 18:58

8 Answers8

31

Put emphasis on the fact that the problem was discovered and fixed swiftly by your team. Have detailed metrics on the number of failures, which customers were affected, etc. ready, in-hand. Have a contingency plan ready to describe that will prevent similar issues from happening in the future. Engender a sense of comradery with the higher-up because you are all on the same team and it's a team problem. If you convey a sense of urgency and give the impression that you appreciate the impact to the bottom line as much as they do, they will respond much better.

Lowly techs often make the mistake of going to upper management with their tail tucked between their legs, like a child who shamefully shows his parents the lamp he broke and waits for a spanking. You are an adult and a professional - leap into action and coordinate the right people to be in place to make the right decisions to fix it. In a case like this, that inevitably means bringing in upper management, but do so with an intention of solution seeking, not fear.

Rex M
  • 142,167
  • 33
  • 283
  • 313
  • +1 for the details and metrics to show that you do care – Brandon Feb 24 '09 at 03:57
  • 1
    It took them A MONTH to NOTICE!!! This is in no way "fixed swiftly". No amount of sugar can coat this fiasco. – Huntrods Feb 24 '09 at 04:25
  • 3
    @huntrods not about sugar-coating, it's about giving proper recognition to the positive aspects of the situation. It doesn't matter if it wasn't noticed for a day or a year, if it was fixed within the hour once someone did realize it. That's worth kudos. – Rex M Feb 24 '09 at 15:06
  • Just have a plan in place to notice that kind of thing faster in the future... – Knobloch Feb 24 '09 at 15:44
  • Taking the steps to assure the customer that it will not happen again is critical. Don't just tell them what you will do, show them. – Chris Ballance Feb 24 '09 at 18:43
  • +1 for "with their tail tucked between their legs, like a child who shamefully shows his parents the lamp he broke and waits for a spanking." :) – Sophia Feb 25 '09 at 08:21
8

You bring shame to your department. You know what you must do.

http://en.wikipedia.org/wiki/Seppuku

Ted Dziuba
  • 2,495
  • 1
  • 22
  • 16
4

Gee, bad news for ya - but it is someones fault.

The folks who built the server and installed the apps and signed off on putting them into production use without testing them. :-)

Pretty much the only way to break this to the management is to acknowledge the MAJOR FUBAR and show them the plan for making sure this kind of situation doesn't happen again.

Good luck. :-)

Ron Savage
  • 10,923
  • 4
  • 26
  • 35
  • just love that *I* get to break the news, even though I'm not responsible for either that software or the hardware. – Dano Feb 24 '09 at 03:46
  • 1
    Actually, getting to present it is key - focus on what "your" organization did wrong and how you will fix it - downplay what everyone else did wrong, just list it in the sequence of "what happened". You don't want to apear to be pointing fingers. :-) – Ron Savage Feb 24 '09 at 03:52
3

Raise the issue as soon as possible.

Come with a clear plan/lists of steps of how to mitigate the problem:

  • how to fix the issue, so further processing works fines
  • is it possible to determine which transactions are affected
  • what is necessary to ensure this does not happen again - automated tests for deployment, preproduction stage for new servers, anything else?

Be proactive in resolving the situation. As long as it's not a direct fault of yours, you might even benefit from the whole snafu.

Franci Penov
  • 74,861
  • 18
  • 132
  • 169
  • 1
    Along those lines, why is the OP discussing it here instead of being in his manager's office the minute he figured out the problem with a solution in hand? – NotMe Feb 24 '09 at 05:01
  • Yeah, I know. With such a serious problem, he shouldn't be wasting time here. – Franci Penov Feb 24 '09 at 05:03
  • it was fixed before i left work. just trying to figure out how to tell other departments the next morning. – Dano Feb 24 '09 at 12:52
2

Being honest and direct is the best, rather than trying to cover up certain aspects of what happened.

Don't blame anyone, simply accept that a problem happened, propose a solution, and execute on that solution. Communicate this plan to your superiors and be clear about why you are taking the steps you are taking to solve the problem.

The time to find responsible parties and blame comes after, solving issues having to do with collecting money from customers comes first.

Once the immediate problem is solved, then find a way to ensure that whatever caused this problem cannot happen again. Have a plan.

matt b
  • 138,234
  • 66
  • 282
  • 345
2

Point out

  • What happened
  • Why it happened
  • What you think the fallout was (ie, missed purchase order retractions)
  • What you've already done to fix it
  • What you need to do to (if there's more fixing needed)
  • What management needs to do, say, spend (if needed)
  • What can be done to prevent similar incidences in the future

Be proactive about reporting it and spin the negative into a positive ("we've learned the following valuable lessons").

Avoid pointing the finger wherever possible unless asked, and try to spin that in a positive light too. Techs make mistakes; they are human after all. If they can learn from the mistakes made they're probably worth keeping around.

Adam Hawes
  • 5,439
  • 1
  • 23
  • 30
0

Whatever you do, make sure you have agreed on it beforehand with your immediate superior, at least. Even if you are iS director.

dkretz
  • 37,399
  • 13
  • 80
  • 138
-1

lie or cover it up :-), if you can shed the blame to a new intern ill award you 10 kittens!

Karl
  • 2,927
  • 8
  • 31
  • 39