load balance across several SSDs in a server

Question

I would like to have a volume spanning several SSDs or HDDs in my server. When I write a file to this volume, the file is written in whole to one of those drives chosen randomly or in round robin fashion. If a block of a drive fails - I am losing one file it holds. If a drive fails altogether - I am losing all files written to it, but the volume is still available and contains the files from other drives. Clearly, RAID doesn't fit the bill here. The task seems pretty basic though - can someone point me out to the right Linux direction?

intensively read and write millions of small files: load balance across several SSDs in a server — wick, Aug 09 '15 at 20:49
Can you give context? Capacity needs, hardware involved, what type of application this is? Because otherwise, it's a bit difficult to answer. — ewwhite, Aug 09 '15 at 20:51
it is because you trying to answer some other question I didn't ask :-) Let's assume it is a theoretical question. RAID will either bring down whole volume (0) or eat at performance if used with parity (5,6). And I want neither. — wick, Aug 09 '15 at 20:59
Why would you pick either of those RAID levels? RAID 10 and hot spares will mitigate multiple disc failures. Lets not get into this raid is best discussion though. Your real question should be what level of IOPs do I need, what % are read write and design your storage requirements around that — Drifter104, Aug 09 '15 at 21:08
In order to have good answers, don't you have to ask real questions? Or else whatever answer is give you could in theory change the scope as their is nothing to pin the real question to. — albal, Aug 10 '15 at 01:54

score 0 · Answer 1 · answered Aug 09 '15 at 23:40

0

I'm not sure why you'd want this over traditional RAID. But perhaps something like the copies= directive in the ZFS filesystem could be useful to you.

answered Aug 09 '15 at 23:40

ewwhite

197,159
92
443
809

Gene · Accepted Answer · 2015-08-10T01:55:49.357

So you want to automatically distribute data between physically different file systems without providing redundancy for the data?

Linux does not have a built-in method for this. You can use MD or ZFS to set up RAID, but automatic distribution and tracking of files between different file systems does not exist. This would be an application level (i.e. userland) function, you will need to look for an application that does this or write your own.

For example: Apache Cassandra supports multiple data directories/locations being assigned to it. Typically it is different file systems being defined. Cassandra keeps track of what data is where and tries to distribute the data evenly, there is no redundancy between these locations on the local server. Cassandra replicates across the network.

You might be able to use GlusterFS to this effect. Create a distributed volume on a single server with multiple bricks (each brick being a different file system) then mount the volume locally. I've never tried this, so your mileage may vary.

HBruijn · Answer 3 · 2015-08-10T08:21:50.100

My impression is that you're looking for a Union Filesystem where you have two (or more) disks, each with their own file-system:

/hdd1             /hdd2
|                 |
+-- /dir1         +-- /dir1
|   |             |   |
|   +- file2      |   +- file4
|                 |   +- file2
+-- file1         |
|                 +-- file5
+-- /dir2         |
    |             +-- /dir3
    +- file3          |
                      +- file6

which you combine in a single view/overlay that is the union, the combination, of the two:

/hdd_common
|
+-- /dir1
|   |
|   +-- file2  
|   +-- file4
|
|-- /dir2
|   |
|   + file3
|
+-- /dir3
|   |
|   +-- file6
|
+-- file1
+-- file5

A FUSE based example, and the source of the ASCII-art above, is mhddfs and the Wikipedia article lists a couple more.

That meets your requirement of always having the whole file on a single disk and unlike a JBOD array a single disk failure won't result in the loss of your complete dataset. But there is no guarantee that writes will be distributed evenly either.

score 0 · Answer 4 · answered Dec 29 '18 at 18:01

As @HBruijin pointed out, this is best solved via FUSE. The reason for this is that your storage system is in layers. RAID and JBOD work below the file system layer, so they have no idea what a "file" actually is. They work in disk blocks, so they cannot guarantee that a file won't be split up. A filesystem can't do this task on its own without significant complexity, since its metadata would need to be split between multiple disks and it would need to survive losing one of those drives. This leaves a layer above the filesystem, such that we have simple mountable file systems on each drive. That leaves us with FUSE as the solution

However, Unionfs always writes to a specific location, not random or round-robin. Thus, it doesn't fit the bill. It should be noted specifically that this is the case. The reference made by @HBruijin to mhddfs is mostly correct, but doesn't actual load balance in any of the ways the OP asked for (although this was only implied by the use of random and round-robin and not specifically stated). If a sufficiently low limit is set, mhddfs writes to the drive with the most available space, which could always be the same drive if you have one large drive and one small one (no load balancing). Of course, since it's FUSE, it would be trivial to change the source and do a round-robin (remembering to round-robin all drives who's remaining free space is large enough for the file being written rather than all drives in total).

load balance across several SSDs in a server

4 Answers4