47

One of our favourite StackExchange Overlords, Jeff Atwood, wrote a Coding Horror blog article in May 2011, decrying the unreliability of SSD drives.

Solid state hard drives fail. A lot. And not just any fail. I'm talking about catastrophic, oh-my-God-what-just-happened-to-all-my-data instant gigafail. It's not pretty.

[Emphasis his.]

He then gave anecdotes about 11 out of 12 SSDs failing within 18 months (average lifetime: around 227 days, according to one commenter). The commenters added more anecdotes.

Dire predictions indeed.

Meanwhile, the manufacturers are painting a far rosier picture:

(I have haven't chased down all of the device he mentioned - enough to show a discrepancy between the claim. I have tried to match the tech specs to the devices that Atwood describes, but that can be tricky sometimes - feel free to double-check.)

So, my question is: Who should I believe?

(Possible approaches include: Are there any independent tests? Are there formal, audited procedures that manufacturers must follow before making an MTBF claim? Is MTBF a misleading measurement? Have Atwood and his cronies just been unlucky?)

Oddthinking
  • 140,378
  • 46
  • 548
  • 638
  • 1
    When SSD hard drives were new, my suppliers saw a higher failure rate than with non-SSD hard drives. I'm unsure if this has changed since then, but the price factor has kept me and my clients away from these drives as well. (_I'm a computer consultant and one aspect of the relationship I have with my suppliers is that we all prefer to deal with equipment that fails less because warranty service isn't profitable._) – Randolf Richardson Nov 26 '11 at 02:47
  • 2
    @Randolf: Yes, no-one wins when a drive fails before its warranty period finishes. And HDD MTBF ratings are a concern. (e.g. Seagate HDD [1.2 million hours (they prefer to use Annualized Failure Rate (AFR)](http://seagate.custkb.com/seagate/crm/selfservice/search.jsp?DocId=174791&NewLang=en). Google seem to have taken down their [famous report](http://static.googleusercontent.com/external_content/untrusted_dlcp/labs.google.com/en//en//papers/disk_failures.pdf), but [they found much higher AFRs for HDD than expected](http://storagemojo.com/2007/02/19/googles-disk-failure-experience/). Cont'd. – Oddthinking Nov 26 '11 at 03:01
  • 1
    There are two "buts" though. (1) As Atwood points out, the higher failure rate may still be acceptable for some purposes (e.g. OS & swap space on a personal PC). (2) If we were to believe the 2 million hour MTBF from the manufacturers, SSDs are comparable to HDDs. – Oddthinking Nov 26 '11 at 03:03
  • Note, that it's hard to figure out what usage pattern he talks about, but at least some of these he writes about are heavy server usage. Unlike with HDDs, where you can use commodity one on servers, with SSD it's completely different: technology (MLC vs SLC), controller chips (way more advanced for servers), over-commiting (server ones do way more), garbage-collection strategies etc. In one of the cases Jeff describes, commodity SSDs have been under heavy use in StackExchange servers. see also: http://serverfault.com/questions/229833/is-it-safe-to-use-consumer-mlc-ssds-in-a-server – vartec Nov 26 '11 at 12:10
  • I've been having a good look, but it seems there has been very little study on this topic. The most scientific data I've found is from a French retailer, and the statistics measure return rates, which is very different to failure rates. [Tomshardware summary here.](http://www.tomshardware.com/reviews/ssd-reliability-failure-rate,2923-3.html) – John Lyon Nov 28 '11 at 00:22
  • And, confusingly, any research will have to correct for the drives actual manufacturer. An X branded drive can have a controller from any of Y companies: (Intel = Intel only), (Corsair = Jmicron, Toshiba, Indilinx, Sandforce), (Crucial = Indlinx and Marvell), (Kingston = Jmicron, Toshiba, Indilinx), (OCZ = Jmicron, Toshiba, Indilinx, Sandforce, Samsung) – John Lyon Nov 28 '11 at 00:25
  • 1
    @jozzas. Thanks for searching and thanks even more for the link. Interesting to see the various views. I'm not convinced they have considered the enterprise-class versus consumer-class SSD differences sufficiently, but it certainly adds fuel to the fire. – Oddthinking Nov 28 '11 at 00:35
  • Perhaps it's now worthy of a separate question, but when SSDs fail are they more likely to have a complete catastrophic failure? And BTW what is the definition of failure in the existing answers? I assume at least they are comparing like with like within each result set/graph. – Mark Hurd Mar 05 '13 at 02:31
  • @Muhammad: This is off-topic here. Please take to chat or meta. – Oddthinking Dec 27 '14 at 17:22

3 Answers3

9

Jeff's findings are anecdotal and by no means a good representation. If SSDs did have a ~90% failure rate as Jeff's numbers would indicate, then various trade commisions would have stepped in by now.

Some informal studies indicate that SSDs have a lower failure rate than traditional hard-drives.

This investigation by website BeHardware found the following figures for return (not failure) rates for SSDs by manufacturer.

Intel 0.3% (against 0.6%) - Kingston 1.2% (against 2.4%) - Crucial 1.9% (against 2.2%) - Corsair 2.7% (against 2.2%) - OCZ 3.5% (against 2.9%)

A study by Intel put the failure rate of solid state drives at 0.61% with traditional hard-drives being at 4.85%. Intel is a biased source but their methodology seems quite sound and transparent here.

Tom's Hardware, a reputable source reporting on computer hardware did an investigation in which they basically say that more data is needed.

They did have this nice chart however: enter image description here-Source

Quoting from their conclusion:

Giving credit where it is due, many of the IT managers we interviewed reiterated that Intel's SLC-based SSDs are the shining standard by which others are measured. But according to Dr. Hughes, there's nothing to suggest that its products are significantly more reliable than the best hard drive solutions. We don’t have failure rates beyond two years of use for SSDs, so it’s possible that this story will change. Should you be deterred from adopting a solid-state solution? So long as you protect your data through regular backups, which is imperative regardless of your preferred storage technology, then we don't see any reason to shy away from SSDs. To the contrary, we're running them in all of our test beds and most of our personal workstations. Rather, our purpose here is to call into question the idea that SSDs are definitely more reliable than hard drives, based on today's limited backup for such a claim.

The claim being investigated is that SSDs are more reliable than HDDs. The claim that SSDs have an over 90% failure rate as per your question seems limited to anecdotes and should not be taken seriously.

There is no evidence that would indicate that failure rates are as extreme as the ones claimed in your question.

Sonny Ordell
  • 8,695
  • 4
  • 64
  • 102
4

Underwriters Labs is supposed to be an independent source of information about electronics products, although they test mainly for consumer safety. However, they have "quality" standards they certify such as ISO/IEC 11801, and NEMA WC66.

This paper (PDF) mentions that there are basically three grades of SSDs (Consumer, Commercial, and Industrial), and that some testing standards are not applied across each type, resulting in claims that may not all match up.

Another good organization to go to is Consumer Reports. They have not done a review of any SSDs in comparison to each other, however they have done some reviews on computers with an SSD and were impressed by the speed of applications that access drive memory.

All other information on MTBF of these drives sadly comes from the manufacturers themselves, so I would take that with a grain of salt. As SSDs become more commonplace in the marketplace, perhaps there will be more information from independent testing sources. For now, understand that standards are not being advertised consistently, and may be leading to confusion.

Comparing product attributes alone is not enough to distinguish between the three emerging SSD categories: Consumer, Commercial or Industrial Grade. While Consumer Grade SSDs are more recognizable since they are optimized for the lowest cost per gigabyte, the distinction between Commercial and Industrial Grade SSD is less obvious. Comparing product specifications doesn’t adequately uncover this distinction, and an in-depth review of the vendor’s design, testing and manufacturing procedures is needed to assess the category into which the drive falls.

A vendor that sells Commercial Grade SSDs will most likely not include many of these practices. Typically, environmental testing is done on non-operational drives, burn-in testing is limited or non-existent, the design phase does not include HALT testing and margin testing, and manufacturing testing is done on lot samples.

Larian LeQuella
  • 44,977
  • 18
  • 187
  • 208
  • 2
    Having skimmed the PDF, I understand there are differences between Consumer, Commercial and Industrial. I understand that environmental factors affect MTBF, which is an important point. However, even if the drives listed in the question are the lowest quality consumer drives, the discrepancy between the manufacturer's MTBF claims and Jeff Atwood's findings are still extraordinary. – Oddthinking Jan 31 '12 at 14:24
4

The big question is what exactly the manufacturers mean by MTBF, and if the MTBF really matters for SSDs. As far as I understand it, the lifetime of an SSD depends a lot on how you use it, namely how many writes you perform. Each flash cell will survive only a limited number of erase cycles (a few thousand). So in a desktop computer you probably won't run into trouble, but if you use the SSD in a server, it'll fail a lot earlier.

There's a test in the German computer magazine c't that you can buy for 1,50€. I read it a few weeks ago, and the main thing I remember is that they didn't manage to produce failures, but to reduce the speed of some of the SSDs substantially. (The method was to continually write random data and to fill the SSD to the limit in order to avoid clever wear leveling algorithms.) Much more extended tests can be found on xtremesystems.org; there they actually managed to produce failures, but I didn't read the details.

Hendrik Vogt
  • 327
  • 2
  • 6