2

I have a backup system handling large amounts of data. I use rsync to backup the data to a remote directory. In the remote location they archive their data on tape for security reasons.

The data is static. Once created it does not change.

Right now I am facing the option of off-lining some of the data to tape, in order to gain disk storage space on the remote location. The off-lining is done by keeping the structure of the file system, so existing files can be browsed without calling them back to disk.

I have been looking at how to manage this through rsync. Since the tape storage is not made for fast data retrieval. Will the -W flag achieve this?

Falcon Momot
  • 25,244
  • 15
  • 63
  • 92
repelente
  • 33
  • 3
  • 1
    Do you have some special interface to your tape device that presents as a file system, or are you just assuming that you can write to tape as if it were a FS? – MadHatter Nov 18 '14 at 11:26
  • I am not writing to tape. I will always be writing to the file system. off lining the data to tape keeps the file system structure though. I will always be able to see the file name, date modified, size. The rsync option is -a so time stamps are preserved. – repelente Nov 18 '14 at 11:31
  • OK, then I really don't understand what you're asking. It still looks to me as if you're trying to treat tape as if it were a file system (either on read or on write). Is that so, or can you clarify? – MadHatter Nov 18 '14 at 11:47
  • How do you do that ? dd everything to the tape ? What's the case if a single file is being re-written ? Overwrite everything to the tape ? – Nikolaidis Fotis Nov 18 '14 at 13:55
  • What's important here is that I am not going to rsync anything to tape. Only the file system (to disk). However, I can off line already existing data from disk to tape. The file system structure will be kept on disk to allow browsing. If Rsync uses the delta algorithm I believe it will attempt to call files from tape back to disk in order to figure out what parts of the file have changed. I want to avoid this because once the data is off-lined fast retrieval will kill the system. So my questions is, if using -W option for rsync will avoid this problem. – repelente Nov 18 '14 at 14:04

1 Answers1

3

Based on your comments it sounds like you've got a hierarchical storage management (HSM) system that is automatically handling restoring files from tape when they're accessed. (You don't specifically state this in your question, but your comment "...I believe it will attempt to call files from tape back to disk.")

The -W argument disables delta copying. This would have no effect on the problem you're trying to avoid.

By default, rsync will use the modification timestamp and file size as a test to decide which files have changed. If your HSM maintains the file size and timestamp on the "stub" files (usually done via sparse files) then rsync shouldn't attempt to perform delta copies of the stubbed files. If, however, the sizes and timestamps don't match then you're going to assume the files have changed and attempt to do a copy.

I don't see any functionality in rsync that would allow for automatic exclusion of files that are stubbed out to your HSM. (I don't know what operating systems are at play, either. Windows, for example, has a file attribute that identifies files stubbed out to HSM.)

If your stub files don't have the proper timestamps and sizes then your best bet would probably be to generate an exclusion list of files that have been stubbed-out and use that to exclude files from rsync.

Evan Anderson
  • 141,881
  • 20
  • 196
  • 331
  • Evan, thanks. You are right on your assumption. I thought about your solution. The data never changes though, this is why I thought sending the whole file will suffice here, rather that making some exclusion list of several hundred thousand of files. An option would be to rsync only files newer than x days. But I didn't see rsync offers a method for this, yet – repelente Nov 19 '14 at 10:48
  • @repelente - The `-W` argument won't help you at all with your HSM problems. If the HSM isn't putting accurate timestamps and sizes on the stubbed files rsync will still try to copy them again, `-W` argument nonwithstanding. – Evan Anderson Nov 19 '14 at 12:34