5

On our Isilon cluster, we have a 124 TB file system. It is currently 38 percent full, with 31 million files. About half the data are image files, and the mean file size is 1.5 MB. We use snapshots to protect against accidental deletion, but we need something different to protect against total failure (e.g., sysadmin error, software error, or water, heat, or fire damage). And because we're a poor research lab, it shouldn't be too expensive.

We currently try to back up to tape, but that has two problems. First, just traversing the directory tree and stating each file takes more than five days, so even an incremental backup takes over a week. Second, and most important, a restore would takes many weeks, even months.

Ideally, we'd like to have access to much of the data again within a week of disaster. (It's fine to get the data back gradually over the course of several weeks if we can choose which directories to restore first, but sourcing new storage equipment and restoring would likely take much longer than that.) The only way I can think of recovering in a week is to maintain a replicate on disk at a separate location. It's OK to lose at least a few days of work, so the replication can lag a little or cover the file system over the course of several days. It's also OK for the replicate to have much poorer performance than the original.

The Isilon solution would be to use SyncIQ to replicate the file system to another cluster. Because this operates at the block level, it avoids the problem of traversing the file system and stat-ing each file. As can be expected, the cost is a little steep: the license for the SyncIQ software is $55k, and then there is the cost of the expensive Isilon storage to synchronize to (although using their cheaper NL storage helps a bit). I expect that the Isilon solution will come to somewhere between $500 and $1000 per TB, which is far better than the $1300–1900/TB we paid for the primary storage, but still a lot of money for us.

Given that raw hard drives can be had for $60/TB these days, I would hope that 124 TB of slow storage can be cobbled together for far below Isilon prices, and that there is a way to replicate changes in less than a week. Can you think of a way?

Vebjorn Ljosa
  • 662
  • 1
  • 5
  • 13
  • Whatever the solution is, I would strongly recommend one that does deduplication. – MDMarra Sep 02 '10 at 15:41
  • I'm not sure how you expect a backup solution to exist that can backup files faster than they can be read. – James L Sep 02 '10 at 15:46
  • @James Lawrie: I only expect it to read the _metadata_ (inodes or similar), in order to determine what has been changed and needs to be replicated (usually not much). On a Unix machine, that is what dump would do. I have no idea whether it's possible for third-party solutions on Isilon. – Vebjorn Ljosa Sep 02 '10 at 15:55
  • @MarkM: Most of the files are microscopy images, and we're pretty good about not keeping multiple copies of them. – Vebjorn Ljosa Sep 02 '10 at 15:57
  • 1
    Good deduplication works at the block level. So it wouldn't need an entire image duplicated to save space. Having plenty of similarly styled files may be all you need. He likely mentioned it because 124TB is a great deal of space, and deduplication could save you some cash. And in all honesty, is $55K really that much to ask for a guarantee of your critical data, especially 124TB worth? – Christopher Karel Sep 02 '10 at 16:46
  • 1
    $55k + storage sounds cheap for what you need. Moving away from the Isilon family might require a lot of time and effort (money). Most alternatives will simply traverse the directory tree because they have no knowledge of the Isilon internal proprietary product (at the block level and below). – Stefan Lasiewski Sep 02 '10 at 17:15
  • 1
    @Christopher Karel: Once you add the cost of storage, it approaches the salary of one of the eight users—and we can't agree on whom to fire. ;p – Vebjorn Ljosa Sep 02 '10 at 17:36
  • @Vebjorn - What @Christopher said. Any dedupe worth its salt works at a block level, not file level. With 124TB of data, there will still be significant space saving in all likelihood. – MDMarra Sep 02 '10 at 18:41
  • I'm amused by wanting a cheap recovery solution for what's really a very expensive storage platform. Just seems ironic. – Tom O'Connor Sep 02 '10 at 19:13
  • I hate to say it but if you can't afford to back it up then you can't really afford to purchase it in the first place. If your data is important enough to warrant this kind of care in the first place then it must be important enough to find the money for the recommended backup solution. – Rob Moir Sep 02 '10 at 20:05

1 Answers1

1

I work at a shop that runs an Isilon cluster as well; I haven't really touched it too much, so I can't say TOO much about any particular details.

But the way we have it setup, we do indeed backup to tape; we have a tape robot so we don't have to deal with switching cartridges all the time (which I suppose makes long backups a lot more tolerable.) We also opted for the more expensive X series Isilon nodes and just got a bunch of them; yes, less storage per node, but also allows for a bit more tolerance for failure.

  • Do you know the size of the filesystem and the number of files? Is there anything Isilon-specific about the backup system? – Vebjorn Ljosa Sep 02 '10 at 17:08
  • We have roughly 400 TB of storage or so. Nothing really Isilon specific - if I were to guess, the Overland tape changer is either mounting the Isilon system through NFS, or there's a dedicated machine that the Overland tape changer is attached to. – Christian Paredes Sep 02 '10 at 20:48
  • I don't know how many files we have; again, I haven't really touched our Isilon cluster. :) – Christian Paredes Sep 02 '10 at 20:49