---- begin preamble ---
RAID is a way of combining lots of hard drives into a common storage pool, with the ability to tolerate loss/failure of one or more drives. If a RAID has no redundant drives configured, then failures on a single drive result in loss of ALL data from ALL drives, even the good ones.

So normally RAID systems incorporate at least one extra drive for data redundancy. If a single drive goes bad in this scenario, all data is still available until the next drive failure.

Meanwhile, one should replace the failed drive ASAP, and then sit through a day-long "RAID rebuild" session, biting the remains of one's fingernails while hoping a second failure (or discovery of an existing bad sector) doesn't kill the rebuild and result in total data loss.

From this it becomes apparent that RAID is not a substitute for a full back-up, or even a more elaborate system of backups.

And even just a simple "software crash", aka. "improper shutdown", will result in the RAID wanting to spend a day or more doing yet another "rebuild" or resynchronization of the array (assuming multi-terrabyte size drives).

You may have guessed that I don't like RAID. smile
---- end preamble ---

In addition to a fragile running environment, RAID also makes it possible to have a single, LARGE filesystem spanning several physical drives. This is especially convenient for video (media) collections, with thousands of files each of which measures giga-bytes in size. Shuffling those around manually among a collection of small filesystems is not a joyful exercise, so using a RAID to create larger filesystems is a pretty commonplace workaround.

I recently discovered a better (for me) way to do this. There's a Linux FUSE filesystem called mhddfs for these situations. The rather awkward name parses as Multiple Hard Disk Drive FileSystem, or mhddfs for short.

To use it, one formats drives (or partitions) individually, one filesystem on each. Then mount (or attach) the filesystems, and use mhddfs to combine (or pool) their storage into one massive higher layer filesystem.

That's it. No fuss, no RAID. If any drive fails, then only the files from that specific drive need be restored from backup (let rsync figure it out) -- no days long resync of a RAID array.

And the individual drives (and files!) are still fully accessible when (or not) mounted as part of the mhddfs array.

One thing I wish mhddfs would do by default, is to keep leaf nodes grouped together on the same underlying drive/filesystem. Eg. If I have a directory called "Top Gear", then ideally all Top Gear episodes should be kept together within that directory on a single underlying drive, rather than potentially being scattered across multiple underlying drives.

Dunno, why, but that's just how I'd like it to work, and so I've patched my copy here to do exactly that.

My main MythTV box now has two 3TB drives, plus one 2TB drive, with the space all pooled together using mhddfs. The backup array for that system uses mhddfs to combine space from four 2TB drives, with both setups conveniently totaling 8TB despite the different numbers and capacities of drives used.

Anyway, perhaps something like this might be of use to others out there. It sure has made managing our media collection much easier than before.

Cheers