4

I work for a company that uses standard 2.5" SATA HD's in our product. We presently test them by running the Linux 'badblocks -w' command on them when we get them - but they are 160 gig drives, so that takes like 5 hours (we boot parted magic onto a PC to do the scan). We don't actually build that many systems at a time, so this doable, but seriously annoying.

Is there any research or anecdotal evidence on what a good incoming test for a hard drive should be? I'm thinking that we should just wipe them with all zeros, write out our image, and do a full drive read back. That would end up being only about 1 hour 45 minutes total.

Given that drives do block remapping on their own, would what I've proposed show up any infant mortality just as well as running badblocks?

Michael Kohne
  • 2,334
  • 1
  • 16
  • 29

3 Answers3

4

I think badblocks may still do what you want, but you are just not passing enough options.

By default it with the -w option it will run four passes of your hard drive writing these patterns. 00000000, 01010101, 10101010, and 11111111. You probably should pass the -t option and just run a single pass with one of those patterns. Running all four passes is probably more then you need.

Zoredache
  • 130,897
  • 41
  • 276
  • 420
  • This is similar to what I was thinking, I'm still hoping that someone with a decent number of drives passing through their hands has some data on the subject. I've just got too few drives running through here to do any sort of analysis. – Michael Kohne Jun 12 '12 at 11:30
  • A few years old now, but still definitely worth the read if you haven't already. http://research.google.com/pubs/pub32774.html – tudor -Reinstate Monica- Jun 04 '13 at 00:05
2

I'd recommend filling the drive with 0s then 1s. Check the SMART values before, during, and after. Anything over that is overkill, if this isn't itself.

Hyppy
  • 15,608
  • 1
  • 38
  • 59
2

You still don't talk numbers. For example:

  • what is the percent of drives that fail this test;
  • what do you do with those;
  • do you take detailed statistics by vendor/series.

And the best one:

  • what result do you want to accomplish.

Without numbers I can only say that you are wasting 1 hour 45 minutes per drive because of:

  • synthetic load;
  • generic environment;
  • any "stress test" has a chance damaging drive.

You would be surprised that if you just check SMART one month after deployment you will get much more useful statistics because of:

  • each hard drive would work with real load;
  • each hard drive would work in real environment.
kworr
  • 1,055
  • 8
  • 14