Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Page 1 of 3 1 2 3 >
Topic Options
#352797 - 25/06/2012 15:47 mhddfs -- Pooling filesystems together for bulk storage
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
---- begin preamble ---
RAID is a way of combining lots of hard drives into a common storage pool, with the ability to tolerate loss/failure of one or more drives. If a RAID has no redundant drives configured, then failures on a single drive result in loss of ALL data from ALL drives, even the good ones.

So normally RAID systems incorporate at least one extra drive for data redundancy. If a single drive goes bad in this scenario, all data is still available until the next drive failure.

Meanwhile, one should replace the failed drive ASAP, and then sit through a day-long "RAID rebuild" session, biting the remains of one's fingernails while hoping a second failure (or discovery of an existing bad sector) doesn't kill the rebuild and result in total data loss.

From this it becomes apparent that RAID is not a substitute for a full back-up, or even a more elaborate system of backups.

And even just a simple "software crash", aka. "improper shutdown", will result in the RAID wanting to spend a day or more doing yet another "rebuild" or resynchronization of the array (assuming multi-terrabyte size drives).

You may have guessed that I don't like RAID. smile
---- end preamble ---

In addition to a fragile running environment, RAID also makes it possible to have a single, LARGE filesystem spanning several physical drives. This is especially convenient for video (media) collections, with thousands of files each of which measures giga-bytes in size. Shuffling those around manually among a collection of small filesystems is not a joyful exercise, so using a RAID to create larger filesystems is a pretty commonplace workaround.

I recently discovered a better (for me) way to do this. There's a Linux FUSE filesystem called mhddfs for these situations. The rather awkward name parses as Multiple Hard Disk Drive FileSystem, or mhddfs for short.

To use it, one formats drives (or partitions) individually, one filesystem on each. Then mount (or attach) the filesystems, and use mhddfs to combine (or pool) their storage into one massive higher layer filesystem.

That's it. No fuss, no RAID. If any drive fails, then only the files from that specific drive need be restored from backup (let rsync figure it out) -- no days long resync of a RAID array.

And the individual drives (and files!) are still fully accessible when (or not) mounted as part of the mhddfs array.

One thing I wish mhddfs would do by default, is to keep leaf nodes grouped together on the same underlying drive/filesystem. Eg. If I have a directory called "Top Gear", then ideally all Top Gear episodes should be kept together within that directory on a single underlying drive, rather than potentially being scattered across multiple underlying drives.

Dunno, why, but that's just how I'd like it to work, and so I've patched my copy here to do exactly that.

My main MythTV box now has two 3TB drives, plus one 2TB drive, with the space all pooled together using mhddfs. The backup array for that system uses mhddfs to combine space from four 2TB drives, with both setups conveniently totaling 8TB despite the different numbers and capacities of drives used.

Anyway, perhaps something like this might be of use to others out there. It sure has made managing our media collection much easier than before.

Cheers

Top
#352798 - 25/06/2012 15:58 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
Speaking of 3TB drives: I purchased the Western Digital "Green" drives for this setup. My initial observations of them are that (1) they are as mechanically quiet as the 2TB drives, BUT (2) they do vibrate more than the 2TB ones, and this may make them audible in some cases. I also wonder about endurance with all of that vibration.

The vibration from the extra platter is not bad -- less than a typical 7200rpm drive -- but it is noticeable when compared with the 2TB versions.

Top
#352799 - 25/06/2012 16:15 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
andy
carpal tunnel

Registered: 10/06/1999
Posts: 5914
Loc: Wivenhoe, Essex, UK
Luckily my data needs for my server still fit on a single disk (500GB data, 500GB of backups from other machines).

So for me, I just throw disks at the problem. I stick with a 3 disk RAID1 array, which means I can take any single drive out at any point and still have redundancy. And any drive I take out can be used to rebuild the full setup.

I'm planning on moving to XFS soon, I'll still be sticking with mirroring though (probably a 4 disk mirror now). I'll probably add an SSD as a cache device too, given that I have a 160GB doing nothing at the moment.

XFS seems to be so much more robust that most other file systems.

mhddfs sounds interesting though, but I'm glad I don't need it wink
_________________________
Remind me to change my signature to something more interesting someday

Top
#352800 - 25/06/2012 16:49 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: andy]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
Yeah, I like XFS here, too. That's what the MythTV box is using for the underlying filesystems that mhddfs then pools together.

Something I don't do (yet), is use an SSD to hold the logs for the XFS filesystems. In theory that should give a tremendous performance boost, but I just don't know if I trust that much chewing gum and string. smile

Top
#352801 - 25/06/2012 17:11 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
I was really hoping ZFS would have taken off. It offered the same type of storage pooling that mhddfs offers, and also offered redundancy if you wanted. It's very similar to how larger enterprise storage devices work, like the EVA series I supported at HP.

Some RAID controllers do offer a similar option, usually termed JBOD (Just a Bunch Of Disks). They would operate at a block level though instead of file level, and just sit there filling up the first disk, then rolling over to the second, third, etc.

Top
#352802 - 25/06/2012 17:15 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: drakino]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Yeah, but if you lose one disk of a JBOD, you still pretty much lose everything. The filesystem is still corrupted.
_________________________
Bitt Faulk

Top
#352804 - 25/06/2012 18:29 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
There is a bit of a performance hit with mhddfs --> everything gets relayed from the kernel to the mhddfs task (userspace) and then back to the kernel and on to the original task.

The CPU usage is noticeable. Not huge, not a problem, but noticeable. I wouldn't put a news/mail server on mhddfs, but it's quite good for a home media server.

One project I might do if I get time/bored, is to write an in-kernel implementation of a simplified version of it, which would get rid of the double copying on reads/writes and make it all pretty much invisible performance-wise.

I don't need to do that, but it just looks like a fun way to play with the Linux VFS layer.

Cheers

Top
#352805 - 25/06/2012 18:41 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
Originally Posted By: mlord
One project I might do if I get time/bored, is to write an in-kernel implementation of a simplified version of it, which would get rid of the double copying on reads/writes and make it all pretty much invisible performance-wise.


Mmmm.. thinking about this more now, and perhaps a better thing might be to write an extension to FUSE to allow file redirection to a different path, like symlinks do.

With that feature, mhddfs could continue to manage the directory structure, but then redirect actual file accesses to the REAL underlying file on its filesystem. Reads/writes would then happen completely in-kernel, at full performance, and things like mmap() etc.. would all work natively.

Nice and simple, and it's the kind of thing that might actually get accepted upstream into default kernels some day.

Cheers

Top
#352810 - 25/06/2012 20:00 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
Shonky
pooh-bah

Registered: 12/01/2002
Posts: 2009
Loc: Brisbane, Australia
Latest MythTV already supports storage groups. Not quite as complete as a "fused" filesystem like this though.

unRAID uses a similar principle to join file systems together (and adds a RAID type functionality on top of that).
_________________________
Christian
#40104192 120Gb (no longer in my E36 M3, won't fit the E46 M3)

Top
#352822 - 26/06/2012 00:05 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: Shonky]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
Storage Groups are an example of the "all software eventually evolves to implement email" syndrome. Not a core strength of the mythtv devs; best avoided.

Besides, they only really help with recordings, not videos. 95% of my MythTV stuff consists of "videos".

Cheers

Top
#352827 - 26/06/2012 01:17 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
msaeger
carpal tunnel

Registered: 23/09/2000
Posts: 3608
Loc: Minnetonka, MN
I did not know RAID was that fragile I have no experience with it but the way it was talked about when I went to school was you get a disk failure you just slap in a new on and it fixes itself with no down time.

I can totally believe that this is not true but I am wondering what do people use to back up multiple terabytes of stuff? Just more hard drives? I only backup pictures and documents and I just put them online and hope I don't lose the local copy and online copy at the same time :)but if I had to I could fit all of it on DVD's.
_________________________

Matt

Top
#352831 - 26/06/2012 01:44 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: msaeger]
Shonky
pooh-bah

Registered: 12/01/2002
Posts: 2009
Loc: Brisbane, Australia
"RAID is not a backup" and that's the problem. Many people do not get their head around it.

Mark: The videos module does support storage groups too.

I do like the sound of this so I'm going to look into it further next time I rebuild my MythTV. Currently my recordings and videos are split up appropriately across a few drives with some strategic links to the separate file systems i.e. it doesn't share the space as efficiently but it's close enough for now.
_________________________
Christian
#40104192 120Gb (no longer in my E36 M3, won't fit the E46 M3)

Top
#352834 - 26/06/2012 02:11 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: msaeger]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
RAID isn't as fragile as Mark makes it out to be, but it does have it's downsides at times. And Christian is exactly right, RAID is not a backup solution. (well, unless you are doing the 3 disk RAID 1 trick Andy does, assuming one drive is kept out most of the time).

Much like many things, it all comes down to the quality of the implementation. There are some pretty bad RAID hardware controllers out there. And some pretty bad RAID software setups. But with a good setup, RAID can do it's job of ensuring one (or two or multiple depending on the level) failures don't take out a system. RAID remains one of the easiest ways to raise performance of a storage subsystem when dealing with spinning drives. And modern servers changed the definition of RAID to mean things beyond hard drives. I've worked with RAIDed RAM before, in servers that cannot afford any downtime.

Top
#352835 - 26/06/2012 04:22 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: msaeger]
wfaulk
carpal tunnel

Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
Originally Posted By: msaeger
I did not know RAID was that fragile I have no experience with it but the way it was talked about when I went to school was you get a disk failure you just slap in a new on and it fixes itself with no down time.

RAID can survive the failure of some number of drives, depending on the type of RAID, without downtime, assuming that there are no further issues. (Sometimes a single failed drive can take out an entire IO channel.) When you replace the drive, it will reconstruct the data on the failed drive.

The problem these days is that drives are so large that it takes a long time to recover that data, and you run the risk of another drive failing before the replacement disk's data gets rebuilt. If more drives fail than the RAID set can handle, your data is completely gone and you have to restore from backup. Add to that problem that drives will often fail in clusters, since drives from a single production run will often have very similar lifespans and usage patterns are very similar for all members of a RAID set, and you can get into problems more quickly than you'd like.

There are a number of ways to help alleviate those issues. One is by using "hot spares", where you have one or more drives in the system and running idle waiting for another drive to go bad, which minimizes the amount of time that a RAID set will run in a degraded mode, since it doesn't have to wait for a person to physically replace the bad drive. Another is using a RAID type that can tolerate more drive failures. Typically, if someone says "RAID" without any qualifier, he probably means RAID5, which can tolerate the failure of a single drive. Other versions can tolerate other finite numbers of drives per RAID set (that is: two, three, etc.). And you can even combine the types together. Still other versions have complete duplicates of the data, so that every disk might have one or more backups. The tradeoff here is between an increasing number of drives needed to store a given amount of data and the number of drives that can be lost without resorting to a backup.

In Mark's system, any drive failure results in a restore from backup, but it doesn't affect all of the data on all of the drives, which is what would happen on a RAID set that lost more drives than it could tolerate, but only the data that happened to reside on that one drive.
_________________________
Bitt Faulk

Top
#352836 - 26/06/2012 05:23 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: drakino]
andy
carpal tunnel

Registered: 10/06/1999
Posts: 5914
Loc: Wivenhoe, Essex, UK
Originally Posted By: drakino
And Christian is exactly right, RAID is not a backup solution. (well, unless you are doing the 3 disk RAID 1 trick Andy does, assuming one drive is kept out most of the time).


That would be pretty unworkable, given it would have to copy the entire RAID contents every time you reconnected the third disk. I don't do that.

My backups are entirely separate from my RAID, CrashPlan FTW.
_________________________
Remind me to change my signature to something more interesting someday

Top
#352837 - 26/06/2012 06:38 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: andy]
Shonky
pooh-bah

Registered: 12/01/2002
Posts: 2009
Loc: Brisbane, Australia
Yep so I have unRAID for local redundancy (has the merged fs functionality with separate file systems as well as a parity drive). Also Crashplan (offsite) FTW. I then keep a local Crashplan backup copy on a second machine with the most important stuff like photos so it's basically instant backup for HD failure and slower offsite for house burns down situations.

I don't backup everything though. Media like Movies and TV shows I can deal with the loss of if the house burns down but it's nice to have some level of fail safe to cover a dead drive.


Edited by Shonky (26/06/2012 06:39)
_________________________
Christian
#40104192 120Gb (no longer in my E36 M3, won't fit the E46 M3)

Top
#352838 - 26/06/2012 07:35 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
peter
carpal tunnel

Registered: 13/07/2000
Posts: 4174
Loc: Cambridge, England
Originally Posted By: mlord
One project I might do if I get time/bored, is to write an in-kernel implementation of a simplified version of it, which would get rid of the double copying on reads/writes and make it all pretty much invisible performance-wise.

I don't need to do that, but it just looks like a fun way to play with the Linux VFS layer.

So, a bit like a union mount, but with both underlying FSes writable rather than just the top one?

Peter

Top
#352852 - 26/06/2012 12:16 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: Shonky]
hybrid8
carpal tunnel

Registered: 12/11/2001
Posts: 7738
Loc: Toronto, CANADA
Originally Posted By: Shonky
"RAID is not a backup" and that's the problem.


"RAID is not an alternative to a backup" is more apt. Which means in practical terms, you should still have a backup of the important data on your RAID.
_________________________
Bruno
Twisted Melon : Fine Mac OS Software

Top
#352854 - 26/06/2012 13:48 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: peter]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
Quote:
So, a bit like a union mount, but with both underlying FSes writable rather than just the top one?


That's more or less it (mhddfs), except all drives share the exact same directory structure, or at least they appear to. mhddfs automatically clones directories to other drives as needed when storing new files in the "same directory" as existing files.

It's really quite useful and exactly what a lot of us might want. Way better than JBOD. The code in mhddfs defaults to picking the first filesystem (drive) with sufficient free space (mount option configurable) when storing a new file. And if all drives are below the user's threshold, it instead chooses the filesystem with the most available space.

It's also got a fancy feature to automatically relocate a file if, while writing to it, it runs out of space on the currently chosen drive. I don't really need that feature, and will likely get rid of it if/when I re-implement things.

So from that, one can see that if each of four drives has only 100MB of free space, it still cannot store a 200MB file --> the minor constraint is that each file must fit onto a single filesystem (drive). Not really an issue, but there you have it.

Cheers

Top
#352855 - 26/06/2012 14:32 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: Shonky]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
Originally Posted By: Shonky
I do like the sound of this so I'm going to look into it further next time I rebuild my MythTV. Currently my recordings and videos are split up appropriately across a few drives with some strategic links to the separate file systems i.e. it doesn't share the space as efficiently but it's close enough for now.


When I set it up here, I actually just used the existing separate filesystems, very similar to what you have there right now. That sounds pretty much how things were set up here before mhddfs.

Part of the real beauty of it all, is that you don't have to do much to layer mhddfs onto the existing hodgepodge. smile

Just tidy up the current directory structures if needed, so that each existing filesystem uses a common/compatible directory layout, and then fire up mhddfs to pool them all at a new mount point. The existing filesystems are still there, mounted, and 100% functional and 100% safe, but you then have the option to slowly migrate things to begin using the new mount point.

mhddfs does not save any metadata of its own anywhere. It just redirects accesses one at a time as they happen, with no saved state. So it's easy to adopt over top of existing stuff, and just as easy to get rid of if one changes their mind.

Really, this is something that should already be in the stock kernels, maybe called "poolfs" or something. Heck, even the empeg could have used this had it existed in the day.

Cheers


Edited by mlord (26/06/2012 14:51)

Top
#352856 - 26/06/2012 14:43 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
On my MythTV box, the three big drives are mounted at mount points named /drive1, /drive2, and /drive3.

The mhddfs is mounted at /drives, pooling the other three filesystems into a common mass, with these two commands:

mkdir -p /drives
mhddfs /drive1,/drive2,/drive3 /drives -o \
allow_other,auto_cache,max_write=4194304,uid=1000,gid=1000,mlimit=500G


Most of those mount options aren't really needed, but I want everything on the filesystem to be "owned" by user "mythtv" (uid/gid 1000), and I want to allow larger write() sizes than the piddly default (originally designed for .mp3 files).

A much simplified way, just to try things, would be this:

mkdir -p /drives
mhddfs /drive1,/drive2,/drive3 /drives



Top
#352857 - 26/06/2012 14:49 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: andy]
tahir
pooh-bah

Registered: 27/02/2004
Posts: 1900
Loc: London
How does read/write speed compare to RAID?

It's definitely painful waiting for a RAID array to rebuild.

Top
#352860 - 26/06/2012 18:02 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: tahir]
siberia37
old hand

Registered: 09/01/2002
Posts: 702
Loc: Tacoma,WA
How does mhddfs handle splitting up data across the drives? Does it try to distribute data across the "array" so that if one drive dies the minimum number of data is lost?

Top
#352862 - 26/06/2012 18:54 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: siberia37]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
Originally Posted By: siberia37
How does mhddfs handle splitting up data across the drives?

Originally Posted By: mlord
The code in mhddfs defaults to picking the first filesystem (drive) with sufficient free space (mount option configurable) when storing a new file. And if all drives are below the user's threshold, it instead chooses the filesystem with the most available space.

Top
#352863 - 26/06/2012 18:57 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: tahir]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
Originally Posted By: tahir
How does read/write speed compare to RAID?

It's definitely painful waiting for a RAID array to rebuild.

I'm not sure what you are asking about there. With mhddfs there is nothing to "rebuild". In the unlikely event that a drive has to be replaced, one has to copy back only the files that were on that drive. The time needed for that depends on how full the drive was, but it will nearly always take significantly less time than a RAID resync, unless the drive was very full of very small files or something.

Cheers

Top
#352869 - 27/06/2012 07:38 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
tahir
pooh-bah

Registered: 27/02/2004
Posts: 1900
Loc: London
I was just saying that RAID is a pain when it goes wrong. If there's no performance losses then it might be a useful alternative to RAID even for some of my work network


Edited by tahir (27/06/2012 07:39)

Top
#352870 - 27/06/2012 10:26 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: tahir]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14484
Loc: Canada
Yeah, I don't like RAID resyncs either -- they are the elephant sized flaw in most current RAID implementations -- RAID really needs a journal of some sort to know what actually needs resyncing, so that it doesn't have to mindlessly read 6TB of data (and write 2TB) to resync an 8TB array.

That's just mindless, and a big reason I don't use RAID for anything around the home office here.

But.. mhddfs is not suitable for office use. The code is clean, but it is in userspace not the kernel, so it will not cope well with heavy multi-user loads. Performance would suck, and it might even run into trouble when multiple users are creating/updating a common subset of files/directories.

Good for my one/two user media server, not good for a mail or database server. An in-kernel version would be much, much better for that, which is why I'm rather shocked we don't already have an in-kernel solution.

Cheers

Top
#352871 - 27/06/2012 10:29 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
tahir
pooh-bah

Registered: 27/02/2004
Posts: 1900
Loc: London
Gotcha, thanks

Top
#352873 - 27/06/2012 11:42 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: mlord]
LittleBlueThing
addict

Registered: 11/01/2002
Posts: 612
Loc: Reading, UK
First, I know, I'm biased and 'like' RAID smile

This isn't an attempt to persuade anyone that RAID is the right answer for them - there's a lot it can't do that this type of approach can - effective use of mismatched drive sizes for instance.

Originally Posted By: mlord
And even just a simple "software crash", aka. "improper shutdown", will result in the RAID wanting to spend a day or more doing yet another "rebuild" or resynchronization of the array (assuming multi-terrabyte size drives).


For this issue, have you come across raid bitmaps?

They typically mean that a dirty shutdown even of an actively writing multi-TB raid will often be cleaned before it's even mounted. Yes you can add one to an existing RAID. Obviously they're not useful when a drive fails completely.

I also note that all data 'lost' when a drive dies under mhddfs is not available until the restore is done. Typically RAID provides zero downtime.

The real (and painful) risk of a second failure when a drive does fail means RAID6 or more highly redundant setups are often a better option if you really want to avoid downtime. I'm now using RAID6.

Interestingly you don't address the issue of the backup solution (a non-redundant cold spare?) failing as you read possibly many Tb of data from it? Isn't that the same problem as biting your nails whilst re-syncing an array with a new drive?


Edited by LittleBlueThing (27/06/2012 11:44)
Edit Reason: link to bitmap page
_________________________
LittleBlueThing Running twin 30's

Top
#352877 - 27/06/2012 13:04 Re: mhddfs -- Pooling filesystems together for bulk storage [Re: LittleBlueThing]
tanstaafl.
carpal tunnel

Registered: 08/07/1999
Posts: 5543
Loc: Ajijic, Mexico
Originally Posted By: LitleBlueThing
(a non-redundant cold spare?)
Key word here being "non-redundant". My backups are redundant, although I have to admit that the off-site copies are not as current as the on-site ones, usually a couple months out of date.

But, none of my data is mission-critical. I'd miss it if it were lost, but I wouldn't lose any sleep over it, and most of it could be [time-consumingly] recreated if I had to.

tanstaafl.
_________________________
"There Ain't No Such Thing As A Free Lunch"

Top
Page 1 of 3 1 2 3 >