0

assuming I have read a lot about ZFS with/without ECC, there are quite a few opinions online.. I have still doubts that I could not clarify myself reading the available documents.

Suppose I have two disks mirrored and ZFS (no ECC in my system) let's see what can go wrong:

1) One drive get silently corrupted -> no problem the other drive is fine ZFS recovers

2) Both drives are ok, but during a scrub a single event upset bit flips e memory cell so ZFS might think that a cluster on one of the two disks is corrupted, and at this point, ZFS might corrupts a cluster that was good.

Now my question is concerning case 2), why after that ZFS has found a wrong cluster (due to non-ECC or due to a real issue on the disk) isn't there a sort of second chance/trial? I mean a wrong cluster on the disk isn't going to disappear while a bad memory cell in the RAM is a local thing, ZFS could try to read again the disk using an other RAM memory cells. Also, it could be that the RAM was actually ok and the bit flip was just a temporary flip (due to a cosmic muon) so another attempt even using the same memory cell would clear the issue. Is such a technique existing and/or possible? Does it make sense?

Tiutto
  • 179
  • 1
  • 12
  • Reality is, if the data matters, use ECC. Don't skimp on this. It'll bite you. I run 100TB in 2 RAIDz2's + ECC. – Thomas Jul 21 '18 at 01:23
  • 1
    do you agree that the overall probability of losing data due to a bit flip is 1 over 2^256 (to cause an hash collisions) as I am reading from http://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-your-data ? – Tiutto Jul 21 '18 at 01:50
  • I work for a data backup company, we sell devices without ECC but they are backed up in the cloud as they are intended as intermediate devices. However, we do not sell any enterprise devices without ECC. This has nothing to do with chance, it has everything to do with hardware failure. If RAM goes back, it will take your data. I would never risk my personal data unless it was a secondary backup. – Thomas Jul 22 '18 at 04:24
  • @Thomas *I run 100TB in 2 RAIDz2's* You are a braver man than I. I'd think rebuild times have to be getting pretty scary for that? – Andrew Henle Jul 22 '18 at 14:17
  • @Thomas, why if the RAM goes bad, the data will go bad? data are on disks and unless there is a checksum crash (ruled just by statistics/chance) ZFS will not write into disks. If RAM goes bad and bad let's say ZFS will be "less capable" to scrub properly but it shouldn't, however, mess the data. Or am I missing something? – Tiutto Jul 23 '18 at 13:37

0 Answers0