best way to backup data from disk failure?

Question

As lots of people prefer to backup data to multiple backup storage, this is not a perfect choice.

Lets say a video file stored in a server which connected to hundreds of other server in cluster. The backup mechanism automatically backup every day to backup storage.

But one day, bad sector happen (permanent disk damage) which affect that video file.

Backup mechanism just backup the video as usual. *nix server dont know if that video file is damaged because of disk damage. after 2 months, the older backup snapshot automatically deleted from backup storage. so, all copy of that video file is broken file.

When a visitor trying to play video of that video file, it will stuck in the middle. Imagine this happen in youtube. this is shame.

I believe backup mechanism like this is not effective and needs too much space.

So what is the best way to backup data from disk failure?

DanBig · Answer 1 · 2010-08-11T13:24:06.413

3

Maybe something like a monthly snapshot of the data, in addition to whatever other daily/hourly backups are taking place. Static data benefits from this, in that it never changes, so a backup from last month end, is the same as the month before, and so on.

It sounds like you are talking about a simple 2 month 'full' style backup, which of course, will always be first in last out style. Even in the most basic of backups with say 2 weeks of tape, you would have 10 tapes doing your M-F backups for 2 weeks, and a month end. Those weekly 10 tapes will always be in rotation and the oldest tape will always be over written every 2 weeks.

edited Aug 11 '10 at 13:24

answered Aug 11 '10 at 12:51

DanBig

11,423
1
29
53

1

+1 - On the back of this most places would take either a weekend or month end backup and store this in an archive or offsite. – JamesK Aug 11 '10 at 12:54
+1 Why's it nuked after 2 months? Backup retention should be in multiple years at the very least imo for it to be called backup, otherwise it's just a fancy version of redundancy/replication... – Oskar Duveborn Aug 11 '10 at 13:06

score 2 · Answer 2 · answered Aug 11 '10 at 14:07

2

This is why grandfather-father-son backup rotations are used. Though I find myself going back throuth the months' worth of tapes because a user overwrote or misused their file more often than any hardware issue.

answered Aug 11 '10 at 14:07

Kara Marfia

7,892
5
33
57

score 2 · Answer 3 · answered Aug 11 '10 at 14:11

To ensure data retention, you can implement a checksum system. Crosscheck MD5 weekly, halt backup deletion in case a checksum error occurs. Replay the problematic file(s) from a correct backup.

Long time data retention is a pain, indeed.

Volume Snapshots dont help, because unless the file is written between snapshots, the bad block hasnt been copied to the VSS cache file.

score 1 · Answer 4 · answered Aug 11 '10 at 13:04

This is more of a data retention policy question. Personally, if you have a huge file that people are using every day or often, but no one noticed it was corrupted for a couple months, I'd question how valuable the data is, but there are scenarios where this can happen.

Anyway, the solution could be to have periodic archives of data that are put into storage before permanent deletion. Yearly, ever 6 months, etc. so that if data was to be completely purged, you have it on a "just in case" storage platform.

But again this is a question of data retention policy. If you're very worried about something like this, you could try using a checksum system that compares files to see if anything changes over time; this also gives the benefit of intrusion detection when files are altered that shouldn't be altered.

score 0 · Answer 5 · answered Aug 11 '10 at 13:00

0

When you have a permanent disk error on a sector, you will be informed about this and the backup of this file will fail. If you don't read your logfiles though, bad luck.

answered Aug 11 '10 at 13:00

Sven

98,649
14
180
226

Instead of "hard" damage, what about "soft" data corruption? – user48838 Aug 11 '10 at 13:04
You aren't always informed of it...we had an URE on a RAID 5 array and it was never picked up until we had another disk in the array failed. Refused to rebuild and we had to replace 2 drives instead of 1. In a RAID 5 you know what that means... – Bart Silverstrim Aug 11 '10 at 13:05
Bart: Ouch. Reminds me why I am sitting around nervous today while I wait for one of my RAID6s to rebuild ... Nevertheless, even if you have an URE on one drive and RAID delivers nonetheless from the parity information, you should get a good backup. When a sector dies that is not read until a rebuild, then you have a problem during rebuild. If we could get bad reads without an error message, storage would be completely unreliable. So, if you only throw away your old backups after successfully doing a new one, you should be good. – Sven Aug 11 '10 at 13:51
Soft corruption: The only way to protect against this is keeping as many versions of the file as possible in backup, for as long as you care about it. Of course, deduplication helps a lot in this, especially if this works on the block level. – Sven Aug 11 '10 at 13:57

score 0 · Answer 6 · answered Aug 11 '10 at 13:03

SIS or deduplicated storage where multiple backup sessions are kept in the backup store, but deduplicated, where only unique objects (files or data blocks depending on the actual implementation) are actually added from the multiple backup sessions. That way any changes to the original file will result in a new object within the SIS/dedup system. The SIS/dedup will also be very efficient where only "net new" objects are actually stored as additions to the backup store, all "repeating" objects are only links back to the single instance of it.

best way to backup data from disk failure?

6 Answers6