47

Let's assume that I have a modern magnetic (not SSD) hard drive, manufactured within the past ten years, and the hard drive is packed with the only copies of an unpublished paper about cheap cold fusion written by a scientist who died in the fire that burned down the only cold fusion lab in the world.

dd if=/dev/zero of=/dev/sdx

Whoops.

I've heard people say that if you want to be sure that the data is completely erased, you need to:

  1. Use random data, not zeroes.

  2. Zero the hard drive with multiple passes, either 7 ("standard" practice) or 35 (for the truly paranoid).

  3. Erase using passes that have special alternating patters. Supposedly, this either degausses the original signal or adds enough extra noise that you can't pick it out.

  4. Perform some kind of "low level" erase that causes the heads to move in different patterns, or causes the bit patterns to differ. This requires hardware support.

  5. Raise the platters above their Curie temperature

My question is: is it actually possible to recover data from a zeroed drive? In other words, is it justifiable to use random data / multiple passes / etc. to bury your digital secrets?

I understand that "The Great Zero Challenge" (40 USD prize) has not been won, but hypothetically, if cost of such recovery is large or secret enough, then it's a moot point.

nic
  • 1,466
  • 2
  • 14
  • 23
Dietrich Epp
  • 856
  • 1
  • 8
  • 10
  • I'm unclear. The claim you want us to look at is that "To safely erase data, it is necessary to use write random data OR write out zero multiple times OR use special hardware techniques." You think it might be sufficient to do a single pass of zeroing out data. Is that correct? – Oddthinking Nov 15 '12 at 10:14

1 Answers1

53

TL;DR: It seems that data densities on HDs have increased to the point where it's not feasible to recover data from zeroed drives.

The process of recovering data from a "zeroed" hard drive revolves around the concept of residual magnetism. Essentially, the idea is that if you examine the drive using a magnetic force microscope, there is some tiny difference between bits that were '1' before being overwritten, and bits that were '0'. ArchLinux's wiki has some interesting background info for those interested. Having said this, this process requires a clean-room disassembly of the drive, and expensive equipment to even attempt.

The situation is actually even worse, consider the NIST Guidelines for Media Sanitization [PDF], which states (emphasis mine):

Advancing technology has created a situation that has altered previously held best practices regarding magnetic disk type storage media. Basically the change in track density and the related changes in the storage medium have created a situation where the acts of clearing and purging the media have converged. That is, for ATA disk drives manufactured after 2001 (over 15 GB) clearing by overwriting the media once is adequate to protect the media from both keyboard and laboratory attack.

EDIT: Another study supporting the above is Overwriting Hard Drive Data: The Great Wiping Controversy (sadly behind a paywall), which concludes (emphasis mine):

This study has demonstrated that correctly wiped data cannot reasonably be retrieved even if it is of a small size or found only over small parts of the hard drive. Not even with the use of a MFM or other known methods. The belief that a tool can be developed to retrieve gigabytes or terabytes of information from a wiped drive is in error. Although there is a good chance of recovery for any individual bit from a drive, the chances of recovery of any amount of data from a drive using an electron microscope are negligible.

EDIT 2: It's actually not clear whether the above summary was referring to a "pristrine drive and 1 wipe", or a "pristine drive and 3 wipes". I'd lean towards "it doesn't matter", since they give the chance of recovering a single 32-bit number at around 1.16%, even with a single pass. The authors also assert that no meaningful amount of data can be recovered with confidence levels with are "beyond reasonable doubt", normally a legal requirement in the U.S.

To summarise, it seems that a single pass is sufficient to sanitise the drive to a point where it cannot be recovered with today's technology.

Daniel B
  • 674
  • 6
  • 7
  • 10
    And “negligible chances” is an understatement. Recovering a single 32-bit word correctly has a 1.6% chance. Consequently, the chances of recovering a 8-character password would be 0.0265% (if you knew where exactly it was!). Recovering a 1kb file (a [love letter](http://edition.cnn.com/2012/11/12/us/petraeus-cia-resignation/index.html), say) has a chance of exactly 0 (according to [Google calculator](https://www.google.com/search?q=1.6%25+**+1000)). Or, if you want to be exact, 0.0…256%, where “…” corresponds to almost 2000 zeroes. – Konrad Rudolph Nov 15 '12 at 15:44
  • 2
    @KonradRudolph Indeed, and the password would also have to be in plain text for that to work. To be 100% correct though, I think those probabilities represent getting a completely error-free version back; so it's possible that the info could be used to narrow down an attack probabilistically. It's still a ridiculously unlikely scenario, especially if you consider that the numbers I've quoted are for a "pristine" drive, and according to the authors: a “used” drive has only a marginally better chance of any recovery than tossing a coin. – Daniel B Nov 15 '12 at 16:46
  • 1
    I like this answer, but the Arch Linux wiki seems like an out-of-place citation, since it offers no reasoning to support its claim nor provides references to further reading on the topic. – Dietrich Epp Nov 15 '12 at 19:04
  • @DietrichEpp good point, it was intended as a background link, but actually didn't add much. I've removed the quote, but left the link for background info. – Daniel B Nov 16 '12 at 06:17
  • @KonradRudolph I consider 1.16% for a 32 bit word huge, not negligible. That's a 87% chance of recovering a single bit (assuming independence), so even small redundancies in the message can lead to complete recovery. – CodesInChaos Sep 25 '13 at 13:08
  • 2
    @CodesInChaos No, that doesn’t follow at all. In order to exploit any such redundancy in the retrieval you need to have *some* information about its structure. Furthermore, if you assume independence then you cannot at the same time assume a structure (even across words rather than within them). So essentially the only thing you could *conceivably* recover is redundancies that you possessed before recovery – in other words, information-less redundancy. – Konrad Rudolph Sep 25 '13 at 13:25
  • 2
    The NIST document has been superseded by http://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-88r1.pdf which does not have the cited paragraph anymore. By the way, you say "*cannot be recovered with today's technology*", but higher density just calls for more precise tools, an arms race that a governmental agency can definitely win. – nic Jul 30 '15 at 08:57
  • 3
    @nic thanks for the update. The new document is a bit more wishy-washy on this point, but does not rule out single pass as good enough, e.g. "For ... magnetic media, a single overwrite pass with ... binary zeros typically hinders recovery of data even if state of the art laboratory techniques ... attempt to retrieve the data" and "The Clear pattern should be at least a single write pass with a fixed data value". I also would not use the word "definitely"; the previous documents seemed to imply the exact opposite (as in, the "arms race" was trending towards non-recoverability). – Daniel B Jul 30 '15 at 11:21
  • I wonder how they come to this conclusion. Is it just really really hard to built a magnetism reader mush more precise than commonly in use now? Because theoretically, if you knew the exact mag. level you would probably have a pretty good idea what the value was. From my understanding the cost and work involve means that as far as we know it has never even been tried, but it is hard to imagine why it would be so hard to accomplish if you did have your mind set on that task. – Jonathon Jul 31 '15 at 01:45
  • Actually, the "1.6% per word" figure seems to be made by assuming a random bit corruption rate of 12% per bit, which simply requires that the original data have 1/8 parity bits (identical to ECC RAM) to reconstruct the data. – March Ho Jul 31 '15 at 13:35