-4

I have the following situation:

  • Need to store 1 original + 1 copy for data sets that are 1-2 TB in size
  • Need to use consumer SATA drives, 1 for original, 1 for copy
  • Need to store 50+ such data sets
  • Data sets are ~rarely accessed / changed so drives will be mostly stored in boxes, offline
  • When a data set needs to be accessed / changed, its drives will be connected to a server, using hotswap SATA hdd racks (e.g.)
  • The server needs to expose the drives via network shared folders, over CIFS / SMB and should preferably run Windows

Problem: How do I maintain the mirror between the 2 drives in the most reliable and convenient manner?

What will not work for me:

  • NO hardware RAID1. Having a single point of failure in the controller is not acceptable and I doubt any RAID controller would seamlessly support my usage pattern.
  • NO software RAID1. AFAIK hotswapping is an issue with software RAID.
bogdan
  • 65
  • 11
  • I don't understand why this question was downvoted and put on hold. How is mirroring two drives "off topic" when you say "we're working together to build a library of detailed answers to every question about professional server, networking, or related infrastructure administration". Why does it bother people enough to downvote it? – bogdan Aug 13 '14 at 15:34
  • Your requirements are a bit sophisticated. This is not to say there is anything wrong with your requirements, they mostly make sense. But there is of course the question about how such a system should deal with write attempts happening while only one drive is connected. Or even worse, different writes happening with the two drives connected to different machines. There isn't going to be lots of software supporting your scenario, so restricting yourself to one vendor doesn't sound like a good idea. – kasperd Aug 19 '14 at 21:12

2 Answers2

2

Since the "Windows" part is not a requirement , i dare to suggest using ZFS.

IT wil run on Linux (even if performances are not stellar ... yet) or FreeBSD or IllumOS

I have a very similar workflow i use for my raw photos at home, I do it with USB3 sticks but the idea is the same

1)Create a mirrored zpool

#sudo zpool status zmirrusb
  pool: zmirrusb
  state: ONLINE
  scan: resilvered 18.4M in 8h1m with 0 errors on Sun Jul 28 23:55:06 2013
  config:

    NAME                                                 STATE     READ WRITE CKSUM
    zmirrusb                                             ONLINE       0     0     0
      mirror-0                                           ONLINE       0     0     0
        usb-SanDisk_Cruzer_Fit_4C532000060405101492-0:0  ONLINE       0     0     0
        usb-SanDisk_Cruzer_Fit_4C532000000405100343-0:0  ONLINE       0     0     0

2)Copy your data to it

3)export the pool

# zpool export zmirrusb

4)disconnect your storage and store it somewhere safe

5)when you need to access the data, reconnect the storage

6)import the pool and mount the filesystem

# zpool import zmirrusb
# zfs mount -a

Now you can export the volumes the way you prefer.

Zfs support exporting via Samba (CIFS) out of the box , i never tried it myself but something like this to get you an idea

7) export via samba

#zfs list zmirrusb
NAME                             USED  AVAIL  REFER  MOUNTPOINT
zmirrusb                        19.1G  10.2G  5.84G  /zmirrusb
zmirrusb/stuff                   68.2M  10.2G  1.13M  /zmirrusb/stuff

#zfs set sharesmb=on zmirrusb/stuff

8) Browse the network.

Note that depending on what features of SMB sharing you need you might have to edit the samba configuration in itself, zfs will only make the specific filesystem available but won't deal with other things like authentication and authorization.

Your requirements seems pretty strict and , i guess, you are trying to keep the budget as low as possible , but have you considered using some different kind of storage that would not require so much manual work ?
For this kind of archiving purposes an object based storage service seems to be the perfect fit, a couple of examples:

glusterfs - HA network storage with ability to export on SMB/NFS out of the box.

swift - Openstack S3-like service , accessible via HTTP API (And a milion fuse based projects to make it work like a FS)

Or , if you can really predict the usage patterns of your data try to see if AWS Glacier which is extremely cheap but does not get you to access your data instantly when you need it and requires you to re-download the data whenever you need to access it.

przRocco
  • 396
  • 1
  • 4
  • Thank you for this valuable reply. I'm hoping to avoid adding *nix into the mix to solve just this one problem. I will mark your answer as such if nobody adds a Win based solution. I would even be willing to consider a TaskSched job running something like robocopy which synchs disk A to disk B while leaving only disk A shared over the network. It would be ugly but much easier than switching the server to *nix. AFAIK, ZFS is not production level reliable on Linux and it's not clear what happens if disk B breaks down or if I forget to plug it in and it becomes out of synch by mistake. – bogdan Aug 09 '14 at 09:40
  • @bogdan You could put a Windows-friendly front-end on the ZFS filesystem (Samba or similar)... – voretaq7 Aug 27 '14 at 02:41
  • @voretaq7 I'm trying to avoid even raising the problem of changing OS on the server where these operations occur. The impact would be huge. I'm going to mark the above answer as such and proceed looking for another solution on softwarerecs.stackexchange.com in the hope there's some kind of Windows tool out there that covers this usage scenario. – bogdan Aug 27 '14 at 08:30
1

Typically hot swap hard disk brackets are intended for incidental maintenance purposes and not for daily use.

My recommendation would be to use external USB drives. That gives you the benefit of some extra protection for the disks as well as a connector that is designed for repeated unplugging and insertions. In addition it isn't dependant on "exotic" hardware making your solution more future proof.
With USB-3 the limiting factor will be the read-write performance of the actual disk and not the USB-link and you should get similar performance as you would get from the hot swap SATA port.

Regardless the tried and true method of keeping two datasets synchronised is rsync. Expose one disk to the network and and sync that one to it's back-up.

HBruijn
  • 77,029
  • 24
  • 135
  • 201
  • My situation is such that I'm limited to USB 2 and any given HDD would be plugged in maybe once a month. It's more likely for the rack to break and that's no problem because it's cheap. Regarding rsync, the problem is that I haven't been able to find something similar built from the ground up for Windows. There are rsync ports for Windows but they feel very hackish / not production ready. – bogdan Aug 09 '14 at 12:05