0

We have millions of files in lots of directories, for example:

\00\00\00\00.txt
\00\00\00\01.pdf
\00\00\00\02.html
... so on
\05\55\12\31.txt

backing up these to tape is slow as backing up data in this format is much slower than backing up a single large file.

The total number of files on a disk and the relative size of each file impacts backup performance. Fastest backups occur when the disk contains fewer large size files. Slowest backups occur when the disk contains thousands of small files. Backup Exec Admin Guide.

Would the backup performance significantly increase by creating a virtual hard drive, hosting the data on it once mounted then backing up the vhd instead?

I'm unsure if the underlying data within the vhd would affect this.

what are the drawbacks to this method?

Mark Price
  • 101
  • 4
  • Most backup software allows you to run backups to a hard disk based staging pool, and then relocate those jobs to tape. In this case, the backup archives are created on disk, which are much more well suited to this, and then large archive files are written to tape. – EEAA Aug 17 '14 at 22:24
  • 1
    What operating system and filesystem are you writing about? – ewwhite Aug 17 '14 at 22:54
  • `1.` A backup to disk job is probably going to be faster than a direct backup to tape job. You can then configure/run a duplicate job, which will backup the backup to disk files to tape. `2.` Yes, hosting the files on a VHD and backing up the VHD should be faster. You'll need to make sure that the backup product you use to back up the VHD allows for file level restores from the VHD. – joeqwerty Aug 17 '14 at 22:57
  • Why would backup products use the hard drive as a staging file? Surly they would use RAM? I'm only interested in restoring to a point in time, not individual files. I may do an experiment... – Mark Price Aug 18 '14 at 08:04
  • Is this on Windows? If you had access to ZFS you could send/receive snapshots. – ptman Aug 18 '14 at 10:42
  • Do you have several terabytes of RAM? – Michael Hampton Aug 18 '14 at 11:35

2 Answers2

0

Storing lots of small files in a file system, which itself is kept as a file does have some potential benefits.

If the format of this file is sparse, then the backups will initially be faster. However as time passes and files are created and deleted, the sparse image may not remain as sparse. Eventually the image may end up being much larger than the files within, which of course wastes space on both disk and tape, and slows down backups compared to the speed when the image was new.

Another drawback of the image is that if it is being backed up while any writes are being performed to the file system inside the image, you may end up with a backup where integrity is not preserved.

kasperd
  • 30,455
  • 17
  • 76
  • 124
0

I decided to test this myself.

For the test I created a 25GB VHD on Server 2008R2 and attached it.

I then populated it with 20GB worth of data. 129000 files in 1318 directories.

Then I ran a backup job for the contents of the VHD. Then I detached the VHD and backed up the VHD file itself.

Below are the results.

Data           Elapsed  Byte Count   Job Rate
VHD            00:09:51 25.0 GB      14,222.00 MB/min
VHD Contents   00:07:38 20.2 GB      9,557.00 MB/min

The Elapsed time is longer for the VHD file, however when scaled up to the actual sizes I'm dealing with I'm sure the increased job rate will take over.

Also the VHD Contents job rate seems higher than I would expect. It may be affected by cache from recently creating the files or something else but I can't confirm this right now due to the main job being bundled in with other backup data.

I don't have time or the need to investigate this further at the moment though I may revisit this sometime in the future.

Mark Price
  • 101
  • 4