8

Good morning;

We've been having intermittent power supplies failures in our data center that have been preliminarily attributed to "zinc whiskers". I'm just starting to read about them (I just Googled the term and started picking stuff), and I'm interested in others experiences with them and any cleanup and recovery experiences. Thanks!

Dizzle
  • 175
  • 1
  • 2
  • 8
  • I promised an update (...two years ago...sorry), and the microscope examination did reveal zinc whiskers on a few of the pieces from a failed power supply. We subsequently closed that data center - which was more like a large server room - rather than renovate it. Thanks again to all who offered their insight. – Dizzle Sep 28 '12 at 14:53

2 Answers2

4

Are you asking about on ROHS circuit boards, or from zinc coatings of raised floors and rack mount equipment?

Not that it matters much... since they are thinner than a human hair, they should generally burn off before becoming a real problem. They are also large enough to be caught be even the most modest of air filters.

Within power supplies they should generally burn off before becoming a real problem. Conformal coatings help. They are primarily only a signaling hazard in more sensitive logic circuits that have no conformal coatings (not many of those around).

Most likely you just have a vendor grasping at straws to explain poor quality choices for their power supply OEM. It would be nice if you can name names and model numbers. That might bring more (and more helpful) responses.

kmarsh
  • 3,103
  • 16
  • 22
  • Thanks kmarsh; our issue has been primarily with power supplies for Sun equipment (V480 servers, 3310 SCSI arrays) Apple servers (1st gen Xserves), and Dell servers (PE 2650, 2850, 650, 1850). We've lost 37 over the span of 1 year so far (most of them have failed in two separate larger occurrences), but troubleshooting by our electricians have hit a dead end. And to clarify, I'm asking about the raised floor version of the problem. We were close to attributing this to our power supply vendor of choice, but once it spread to other equipment, we were forced to look elsewhere. – Dizzle May 26 '10 at 17:41
  • Interesting. There is also power anomalies, and grounding issues... these are hard to debug when they aren't occurring at the moment. – kmarsh May 26 '10 at 17:49
  • One thing that's somewhat consistent (maybe about 30% of the time), if the floor panels are removed and replaced there's a corresponding power supply failure. I almost think that putting the last floor panel in place "seals" the floor somewhat, but pushes the air below the floor to an escape point which happens to be the panel-less space under our affected racks? Just grasping and guessing...our next step is to examine the power supplies under a microscope. You mentioned that they should be burned off; so I'm thinking, why some and not others. Perhaps I'll see what temp viewing options I have. – Dizzle May 26 '10 at 18:00
  • +1 on the "burning off" comment -- In any circuit carrying real current (like a power supply) I expect whiskers would vaporize almost instantly. My understanding is they're more a concern in low-current logic circuits where they can introduce subtle errors – voretaq7 May 26 '10 at 18:02
  • @Dizzle - Interesting that the problem seems to correlate with removing floor panels; What kind of work is going on while the floor is open? If someone's touching power (e.g. on a snake bus) that could produce electrical faults that upset your power supplies. – voretaq7 May 26 '10 at 18:04
  • Thanks voretaq7; there were two major events, the first was precipitated by an inspection by an electrician to add 220 twist lock receptacles under the floor, the second happened when I moved some panels around to allow for better chair rolling (a few panels have cable cutouts). But there were several smaller happenings where panels hadn't been moved; for one failure, there was no one in the room at all. Another time, an electrician and I were LOOKING (literally) at the exposed floor (four panels were removed) and lost two power supplies. – Dizzle May 26 '10 at 18:28
  • I guess I can't rule out Zinc whiskers in your case. Supposedly you'll hear a crackling sound as they short out. Have you gotten out a microscope yet? – kmarsh May 27 '10 at 12:22
  • Quick update for those interested; microscope examination of the floor panels revealed the presence of zinc whiskers. Our electricians are waiting to get access to an electron microscope to examine the failed power supplies. As soon as I have more I'll share. – Dizzle Jun 14 '10 at 15:28
  • This may be my last update, depending upon whether or not we decide to clean the data center or move out of it, but the electron microscope analysis confirmed the existence of whiskers in two of the failed power supplies we examined. If we decide to clean the room, I'll send a blurb or two about our cost and experiences. – Dizzle Jun 28 '10 at 18:01
1

I've never had a failure I could attribute to zinc/tin whiskers, though my sample set isn't enormous and I haven't ever really had a rash of power supply failures like you describe that would make me go on a hunt for a root cause.

I'd be looking at more conventional problems (bad capacitors in the power supply or a transient electrical fault rate pretty high, especially since you say you had two "large occurrences" of PSU failures), though it sounds like you already have.


My short list in case it differs from yours/your electrician's:

Electrically: poorly stabilized power because of a wonky UPS or PDU/CDU, ground faults, etc. If your "large occurrences" were in areas served by the same power distribution equipment this becomes more likely.

Environmentally: temperature & humidity; Check inlet/outlet temperature of your equipment (especially if the failures happen in the same physical area of the datacenter: You may discover an airflow/cooling issue causing your gear to run hot).

Equipment/Manufacturer QC: check the dead power supplies for bulging/blown capacitors, especially if the failures are in units bought around the same time. Make sure you're not pushing the power supplies too hard (lots of hard drives & power-hungry CPUs may warrant a bigger PSU)

voretaq7
  • 79,879
  • 17
  • 130
  • 214
  • Yep, you've hit each point that our electricians have hit. The AC/Humidifier unit has been reinspected; the first major incident actually damaged the UPS (a free standing 16KVA unit) and a new UPS was put in place due to the age/cost of repair of the old unit, but more incidents occurred afterward (nothing unusual in the UPS or PDU logs; PDU's are also brand new); several of the failed power supplies have been cracked opened by the electricians and compared to new ones. – Dizzle May 26 '10 at 18:36
  • Since all the obvious stuff has been hit whiskers (or other floating conductive bits) are definitely in-bounds -- It would be very interesting if you can conclusively trace these failures to whiskers. – voretaq7 May 26 '10 at 20:47
  • At this point we're just waiting for the microscope examination, which I hope will be within the next week, but we've just decided to quarantine the space somewhat for now and plan to move things to alternate data centers. I've heard that recovery from this involves cleaning (and possibly replacing) the floor panels and wiping down everything else in the room. If I get a concrete diagnosis, I'll definitely update this; thanks again for your time. – Dizzle May 27 '10 at 13:07