4

A post relating to a political issue is going around saying it is astronomically unlikely that 7 hard drives would crash in the same month, specifically:

The odds of a disk drive failing in any given month are roughly one in 36. The odds of two different drives failing in the same month are roughly one in 36 squared, or 1 in about 1,300...37 to the 7th power = 1 in 78,664,164,096.

My first reaction was that the proper odds would relate to 7 out of N and, in an organization, N is likely to be much larger than 7. OTOH, if the claim is a specific mail server and it's backup, N = 7 might make sense.

Unfortunately, at this point, my Google-fu cannot penetrate the echo-chambering of the "astronomical odds" post to know what has actually been claimed in terms of hard drive failure.

Also, to use (1/36)^7, the odds of drives crashing would have to be independent, which I think could be relevant if the claim of data loss relates to a RAID configuration (?).

I am skeptical that the IRS made the claim for which the post's math would be appropriate (i.e., I am skeptical that the IRS said 'the loss of email was due to the specific failure of 7 out of 7 independent drives'). As dmckee put it in comments: "[I am] asking what the IRS is actually claiming about hard drive failures that might be preventing them for complying with congressional requests for information about some [high politicized event that shouldn't play any part in the answers]."

P.S. I've had trouble finding appropriate tags for this question.

Oddthinking
  • 140,378
  • 46
  • 548
  • 638
Larry OBrien
  • 15,105
  • 2
  • 70
  • 97
  • 2
    He appears to be asking what the IRS is actually claiming about hard drive failures that might be preventing them for complying with congressional requests for information about some [high politicized event that shouldn't play any part in the answers]. Not that I believe there is a political component to either the requests or the inexplicable inability for the agency which requires me to keep meticulous records *or else* to keep half decent records. Really, I don't. No, seriously .... Stop laughing! – dmckee --- ex-moderator kitten Jul 13 '14 at 05:44
  • 2
    It seems the [IRS will be asked to explain their position to a judge soon](http://thehill.com/policy/finance/212001-federal-judge-demands-irs-answers-on-lerner-hard-drive) at which time it should be clearer. As far as I can make out, the IRS claims *one* personal computer (one HDD?) crashed losing an email, and their opponents are concluding that they are also claiming all the hard-drives of all the 7 people involved (how?) in the emails crashed too. – Oddthinking Jul 13 '14 at 06:11
  • Sorry, but your only doubt is if IRS claimed that 7 out of 7 hard drives failed at same time? Then everything else, including probabilities of failure, are irrelevant to the question... – woliveirajr Jul 14 '14 at 13:48
  • @woliveirajr My doubt is that the IRS' claim (whatever it is) is properly described as having a probability of 1/36^7th. I'm not particularly skeptical about an MTBF of 36 months, but more the proper description of the event. – Larry OBrien Jul 14 '14 at 18:20
  • I have deleted a number of comments that disputed the calculations based on the commenter's speculations that differ from the speculations of the OP. The OP's calculations serve their purpose of explaining why the OP is skeptical. – Oddthinking Jul 15 '14 at 18:55
  • 1
    Something to keep in mind: How do we know this isn't a case of shooting first and drawing the targets afterwards? (In other words, asking for the e-mail of people they knew had suffered drive crashes.) – Loren Pechtel Jul 15 '14 at 19:12
  • As an aside, a design flaw in the specific model of hard drive used might make them more prone to failure.They might even all fail at the same time. As an example (most likely not the model of HDD used): [HP Warns That Some SSD Drives Will Fail at 32,768 Hours of Use](https://www.bleepingcomputer.com/news/hardware/hp-warns-that-some-ssd-drives-will-fail-at-32-768-hours-of-use/) – Georg Patscheider Feb 25 '20 at 16:05

1 Answers1

6

The official description of the event by IRS is provided in a letter from Leonard Oursler (IRS) to the Committee on Finance.

Page 11 of the PDF states that each employee has now a 500 MB inbox, and before July 2011 it held only 150 MB. After that capacity is reached, the user should do a backup on his computer or print it and archive it, depending on the IRS policy.

It also states that if the hard drive crashes and cannot be recovered, or if it is recycled, no electronic version of the archived email would be retained. That's because (page 10) all server backup tapes were re-used after six months (before May 2013) to reduce costs.

At page 15, IRS says that her hard drive crashed in mid-2011 and could not be recovered.

Although the IRS is unable to interview Ms. Lerner to learn more, the IRS has determined that Ms. Lerner's computer crashed in mid-2011.

There are two attachments (E and F) that reproduces conversation about the hard disk failure and so on.

Only information, available so far, that IRS says that another six employees lost their emails is provided by the Committee on Ways and Means, not the IRS. A press release claims:

In addition to Lois Lerner’s emails, the IRS cannot produce records from six other IRS employees involved in the targeting of conservative groups. One of those figures is Nikole Flax, who served as Chief of Staff to Steve Miller, who at the time of the targeting was Deputy Commissioner and would later serve as Acting Commissioner of the IRS – a position from which he was fired for his role in the targeting of conservative groups. The timeframe for which Ms. Flax’s communications are purportedly unrecoverable covers when the Washington, DC office wrote and directed the Cincinnati field office to send abusive questionnaires, including inappropriate demands for donor information, to conservative groups.

Camp and Boustany also uncovered that the IRS has been keeping secret for months the fact that the Agency lost these critical records. Ways and Means investigators have confirmed that the Agency first knew of the destroyed emails as early as February 2014 – nearly three months prior to newly installed Commissioner John Koskinen telling the Committee the IRS would produce all of Lois Lerner’s emails.

By the information the IRS provided about each employee having the "backup" of emails that exceeded the quota in their computers, and knowing that they have 90,000 employees, you could do the math of the probability of failure of 7 drives out of 90,000.

Another thing that I'd point out is that I don't think that the precise statement of the IRS was released, or if it was informed if all hard-drives failed at the same time, or almost at the same time, or how far apart were their failures.

woliveirajr
  • 694
  • 1
  • 6
  • 16
  • 1
    Just one point: The chance of failure of 7 drives out of 90,000 is not what we should be concerned about... the quote from the linked ways and means committee says "In addition to Lois Lerner’s emails, the IRS cannot produce records from six other IRS employees involved in the targeting of conservative groups." The six other employees are selected from a much smaller pool, those 'involved in targeting'. If that pool was, say, six people, that's highly unlikely. If it was a thousand, it's not very unlikely. We don't seem to know enough to figure the odds. – Ask About Monica Jul 15 '14 at 18:14
  • 1
    @kbelder please, fell free to edit it and correct it. Since I'm not sure (yet) about the claim, I left this as a community wiki – woliveirajr Jul 15 '14 at 18:17
  • @kbelder - It's also a question of time period. 7 drives over 3 years is very different than 7 in 3 months, even out of a small pool of people. – Bobson Jul 15 '14 at 18:52
  • I am tempted to remove any mention of the calculations. It provokes speculation in the comments. Just leaving the explanation of the source of the claim addresses the question. – Oddthinking Jul 15 '14 at 19:05
  • I don't know that calcs should be removed; it's very relevant if the 6 problems (beyond Lerner's) are the entirety of the group for whom data was requested or are 6 out of (say) 1000. From my perspective, one needs at least a rough sense of the odds to inform one's opinion about whether the claim of data loss is credible or not. (Putting aside the irony of the IRS saying "record keeping is *hard*"!) – Larry OBrien Jul 15 '14 at 19:22
  • 2
    We have seen only one claim of data-loss due to HDD failure. We haven't seen the claim that the other data was lost due to that. It may be another process, such as deleting hard-drives of employees who leave the company or another measure. Note: I am speculating here, which is precisely the pointless navel-gazing activity I am trying to avoid. I am just doing it to demonstrate we don't have enough data to make any hard claims about plausibility at this stage. – Oddthinking Jul 15 '14 at 20:21