Accessing multiple disks under single virtual partition?

Question

I have disks as below on my Debian server:

/dev/sda1  276M   29M  233M  11% /boot
/dev/sdb1  917G  793G   79G  92% /home
/dev/sdc1  1.8T  1.7T   79G  96% /home2
/dev/sdd1  1.8T  1.7T   79G  96% /home3

Is it possible to access /dev/sdb1 /dev/sdc1 /dev/sdd1 as a single partition so that:

a). I can access them from a single mount point like /bighome while files are automatically saved on disks transparently to my scripts?

b). Can this be achieved without loosing existing data on the server?

score 1 · Answer 1 · answered Oct 13 '14 at 06:47

a) Yes, that is what Raid or LVM striping do. Beware that, however, if you build raid0 or LVM striping and one of your disks fails, whole data in it is gone. To overcome this, you need redundancy; raid 6 would be fine options. But to build raid5, you'd better to have all disks are same size...

b) No, as far as I know, there is no way to preserve existing data while creating Raid or LVM. You need to backup them.

score 1 · Accepted Answer · answered Oct 13 '14 at 09:52

Yes, you can do. Contradicting the common belief, there is no need for reformat.

There are filesystems doing exactly the as you want. Two for them I can remember: unionfs, aufs and overlayfs. The last is used on every ubuntu live/install dvd.

These can work because they are working on the level of the filesystem and not of the block devices.

score 1 · Answer 3 · answered Oct 13 '14 at 12:13

As already pointed out there are a couple of ways to do this:

A union filesystem, though these are usually intended for fairly specific use cases (i.e. offering a writeable version of a readonly filesystem or providing a local fast cache for a slow remote filesystem) so are probably not best suited to this situation
LVM striping
RAID in various forms

Of these the one you should go for is RAID, using 5, or linux's special 3-drive RAID10 (which is essentially what IBM hardware RAID controllers call RAID1E) - that way if one drive dies you data is safe so you can plug in another drive and recreate the array. With the other options if one drive experiences trouble you potentially lose all the data on all three drives. The choice between RAID5 and RAID10 depends on what the system is used for. With RAID5 you'll end up with a 3.6T volume but there are write performance issues that affect some use cases (like heavy database work), with 3-drive RAID10 you'll get the same or sometimes better (much better for some write heavy workloads) performance but the usable space will only be 2.7Tb.

You could use RAID0 of course, but that woudl have the same "one dies, all the data is gone" problem.

Migration without dropping the data is possible (as two of your drives are starting empty) but not recommended and still requires some downtime (or at least time when the data is readonly):

Create the RAID array in a degraded state using the two drives (i.e. your two drive R5 array is behaving like a drive has failed)
Stop your users writing to the existing volume (or just block access to it completely)
Copy the data over to the new array
Change the mounting details over so the new copy is the active copy.
(At this point you can re-enable write access.)
Remove the old filesystem
Add the now empty third device to the RAID array as a replacement for the "failed" drive and the array will be built (this may take some time on large drives like that especially if there is active use of the data at the same time)
Once the array rebuild is complete (you can monitor the progress via /proc/mdstat) all is done, you are using all three drives and have protection against any one of them failing at any given time.

Before doing the above I strongly recommend making sure your backups are bang up-to-date and tested in case something goes wrong. That being the case it is probably faster & safer to update & verify your backups, build the array normaly, and rebuild the data from the latest backup.

Accessing multiple disks under single virtual partition?

3 Answers3