#352878 - 27/06/2012 13:20
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: LittleBlueThing]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
Interestingly you don't address the issue of the backup solution (a non-redundant cold spare?) failing as you read possibly many Tb of data from it? The backup copy (or copies) has the same issues/concerns as the original array has. Even with RAID(6) one needs a backup. The backup can be a RAID, mhddfs or IBM reel-to-reel. It's a separate filesystem, and can be designed however one likes. For me, I'm using non-redundant mhddfs again to bond four 2TB drives. These are just media files -- a pain to replace, but not a catastrophe if lost. So I'm happy with a single backup copy of everything, effectively giving me RAID1 on a per-file basis (one copy on the live system, one on the backup), but without the headaches of RAID. Mechanical drives seldom die outright without warning. The failure mode is more typically bad sectors accumulating (dust/dirt scratching the platters), so it might lose a few files, but probably not the entire filesystem. And with mhddfs the damage is limited to only that one drive, not all drives. RAID *needs* RAID, because it throws everything into a single large basket, where a single drive failure loses everything unless redundant drives are configured. And RAID multiplies the probability of total failure by the number of drives.. making a catastrophe more likely. Cheers
|
Top
|
|
|
|
#352879 - 27/06/2012 13:22
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
carpal tunnel
Registered: 08/06/1999
Posts: 7868
|
Yeah, I don't like RAID resyncs either -- they are the elephant sized flaw in most current RAID implementations -- RAID really needs a journal of some sort to know what actually needs resyncing, so that it doesn't have to mindlessly read 6TB of data (and write 2TB) to resync an 8TB array. Which RAID level are you talking about? I'm going to guess RAID 5 based on your comment. It's the nature of how RAID5 works to provide optimal amounts of pooled space with redundancy to survive a single failure that causes pretty extensive rebuild processes. Other RAID levels can help to minimize rebuilds, with the tradeoff of not having as much available space. RAID 0+1 or 1+0 setups wouldn't need to read all 6TB of data to rebuild, instead it would only read the good partner drive. The downside is that all rebuild activity is focused on one drive, so performance can go down dramatically if rebuild priority is high. RAID is a block level setup, very much disconnected from the file system and operation of what the OS does. Adding a file aware journal would also require a decent bit of rework of how RAID functions. The bitmaps solution LittleBlueThing linked to does look like a good attempt at this though, for addressing the improper shutdowns for RAID setups affected by such a condition. Most of the pure RAID systems I work with are hardware based, with controllers that offer support for battery backed cache. This is the other method to mitigate the issue with a bad shutdown.
|
Top
|
|
|
|
#352880 - 27/06/2012 13:25
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: LittleBlueThing]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
For this issue, have you come across raid bitmaps? No, I hadn't. Those sound exactly like the journal mechanism I so wishfully imagined. Definitely on my MUST-HAVE list should I ever succumb to setting up a RAID again. Oh, and I notice that someone at OLS this year will be discussing making RAID rebuilds more intelligent by using the filesystem metadata to only sync actual allocated data areas, rather than blindly syncing all sectors. With a setup that does both of those, RAIDs would be much less painful.
Edited by mlord (27/06/2012 13:26)
|
Top
|
|
|
|
#352883 - 27/06/2012 13:43
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
Mechanical drives seldom die outright without warning. Sorry, Mark, but I'm calling bullshit on this one. Drives die suddenly all the time. I have seldom had any useful warning that a drive was about to go bad, from SMART to small errors. The vast majority of them just up and die. Google's experience is not as extreme as mine, but it still belies your claim: Out of all failed drives, over 56% of them have no count in any of the four strong SMART signals, namely scan errors, reallocation count, offline reallocation, and probational count. In other words, models based only on those signals can never predict more than half of the failed drives. Figure 14 shows that even when we add all remaining SMART parameters (except temperature) we still find that over 36% of all failed drives had zero counts on all variables.
_________________________
Bitt Faulk
|
Top
|
|
|
|
#352884 - 27/06/2012 13:50
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
carpal tunnel
Registered: 08/06/1999
Posts: 7868
|
Oh, and I notice that someone at OLS this year will be discussing making RAID rebuilds more intelligent by using the filesystem metadata to only sync actual allocated data areas, rather than blindly syncing all sectors. This I'm curious to read more about to see how it is implemented. Almost sounds like ZFS levels of redundancy, without possibly having to commit to that filesystem. The tighter coupling would help eliminate your main gripe point with RAID Another interesting way to handle this would be to signal the RAID system with TRIM (or UNMAP on SCSI systems). The RAID system could keep running at the block level with no filesystem knowledge, but still gain the benefits of knowing exactly what would need to be rebuilt in a failure.
|
Top
|
|
|
|
#352894 - 27/06/2012 14:45
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: wfaulk]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
Mechanical drives seldom die outright without warning. Sorry, Mark, but I'm calling bullshit on this one. Okay, I'll reword that for you: I have NEVER, in my 20 years as an industry storage expert, had one of my mechanical drives just die outright without software detectible advance warning. Except for empeg drives. Of course drives do die that way, I suppose. It's just a heck of a lot less common than the gradual failure situations.
Edited by mlord (27/06/2012 14:48)
|
Top
|
|
|
|
#352895 - 27/06/2012 14:49
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: drakino]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
Another interesting way to handle this would be to signal the RAID system with TRIM (or UNMAP on SCSI systems). The RAID system could keep running at the block level with no filesystem knowledge, but still gain the benefits of knowing exactly what would need to be rebuilt in a failure. Big commercial arrays are already doing that with Linux, so, yeah!
|
Top
|
|
|
|
#352897 - 27/06/2012 14:58
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
Mmm.. now that you mention it, there also was that whole IBM/Hitachi "DeathStar" crap, where large batches of drives would die electronically without warning. I guess maybe that's what prompted the whole "home RAID" revolution.
|
Top
|
|
|
|
#352904 - 27/06/2012 18:49
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
Of course drives do die that way, I suppose. It's just a heck of a lot less common than the gradual failure situations. Not according to Google's real-world survey of 100,000 drives.
_________________________
Bitt Faulk
|
Top
|
|
|
|
#352915 - 27/06/2012 23:31
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: wfaulk]
|
carpal tunnel
Registered: 23/09/2000
Posts: 3608
Loc: Minnetonka, MN
|
Whether you think they die suddenly or not maybe depends on how close you are watching.
_________________________
Matt
|
Top
|
|
|
|
#352916 - 28/06/2012 01:05
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: msaeger]
|
pooh-bah
Registered: 15/01/2002
Posts: 1866
Loc: Austin
|
Mark, is this your office? http://imgur.com/a/6ZG5e
|
Top
|
|
|
|
#352917 - 28/06/2012 01:09
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: RobotCaleb]
|
veteran
Registered: 21/03/2002
Posts: 1424
Loc: MA but Irish born
|
Doubtful. The craftsmanship of the woodwork is poor...
|
Top
|
|
|
|
#352918 - 28/06/2012 01:13
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: RobotCaleb]
|
carpal tunnel
Registered: 08/06/1999
Posts: 7868
|
Ouch, no vibrational dampening, those drives probably aren't too happy.
|
Top
|
|
|
|
#352921 - 28/06/2012 06:59
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
addict
Registered: 11/01/2002
Posts: 612
Loc: Reading, UK
|
For this issue, have you come across raid bitmaps? No, I hadn't. Glad to be of service. With a setup that does both of those, RAIDs would be much less painful.
Ah, there's more... I proposed something years back that seems to be making it's way through Neils list: When a drive is about to fail, insert a new hot spare, fast mirror from the failing drive to the new drive and only look at the rest of the RAID for parity when a bad block is encountered. Meanwhile all new writes go to the transient mirror. This speeds up resync/restore for a failed drive massively and you essentially only rely on RAID resilience for the few failing blocks, not all the Tb of data. Finally, put a RAID 1 SSD bcache in front of a RAID6 backend (with working barriers... one day...) and I think that becomes a very, very fast and reliable system.
_________________________
LittleBlueThing
Running twin 30's
|
Top
|
|
|
|
#352922 - 28/06/2012 07:04
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: drakino]
|
pooh-bah
Registered: 12/01/2002
Posts: 2009
Loc: Brisbane, Australia
|
That's a bit ghetto if you ask me for that amount of data.
_________________________
Christian #40104192 120Gb (no longer in my E36 M3, won't fit the E46 M3)
|
Top
|
|
|
|
#352967 - 28/06/2012 23:56
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: wfaulk]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
Of course drives do die that way, I suppose. It's just a heck of a lot less common than the gradual failure situations. Not according to Google's real-world survey of 100,000 drives. Well, they ought to know! I wonder what happens to the data when DeathStar drives are removed from the mix? Those had a huge electronic failure rate, never seen before or since, I believe. Thanks Bitt!
|
Top
|
|
|
|
#352968 - 29/06/2012 00:12
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
carpal tunnel
Registered: 25/12/2000
Posts: 16706
Loc: Raleigh, NC US
|
It says: The data used for this study were collected between December 2005 and August 2006. The DeathStars were back in 2001, so I imagine that none of them were involved at all. Also, None of our SMART data results change significantly when normalized by drive model. The only exception is seek error rate, which is dependent on one specific drive manufacturer
_________________________
Bitt Faulk
|
Top
|
|
|
|
#355990 - 31/10/2012 02:37
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
new poster
Registered: 31/10/2012
Posts: 3
|
One thing I wish mhddfs would do by default, is to keep leaf nodes grouped together on the same underlying drive/filesystem. Eg. If I have a directory called "Top Gear", then ideally all Top Gear episodes should be kept together within that directory on a single underlying drive, rather than potentially being scattered across multiple underlying drives.
Dunno, why, but that's just how I'd like it to work, and so I've patched my copy here to do exactly that. Hi mlord, are you able to provide your patch changes? I would also like mhddfs to function like this and only write a new directory path to another drive when the original source drive is full
|
Top
|
|
|
|
#356000 - 31/10/2012 11:15
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: aTTila]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
One thing I wish mhddfs would do by default, is to keep leaf nodes grouped together on the same underlying drive/filesystem. Eg. If I have a directory called "Top Gear", then ideally all Top Gear episodes should be kept together within that directory on a single underlying drive, rather than potentially being scattered across multiple underlying drives.
Dunno, why, but that's just how I'd like it to work, and so I've patched my copy here to do exactly that. Hi mlord, are you able to provide your patch changes? I would also like mhddfs to function like this and only write a new directory path to another drive when the original source drive is full Patch attached.
Attachments
01_group_files_on_same_fs.patch (254 downloads)Description: mhddfs patch to group leaf files from same subdir onto same drive.
|
Top
|
|
|
|
#356044 - 31/10/2012 21:26
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
new poster
Registered: 31/10/2012
Posts: 3
|
Thanks for that mate, now writing to the same subdirectory but doesn't seem to duplicate the directory structure on another drive with free space when the original drive is full?
Edited by aTTila (31/10/2012 22:45)
|
Top
|
|
|
|
#356050 - 01/11/2012 00:47
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: aTTila]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
Mmm.. dunno about that. My drives haven't gotten full yet. Could be a bug in the patch, or maybe something else (?).
|
Top
|
|
|
|
#356051 - 01/11/2012 00:50
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
Oh, it could have something to do with this code from the patch: + off_t threshold, move_limit = mhdd.move_limit;
+ if (move_limit <= 100)
+ threshold = 10; /* minimum percent-free */
+ else
+ threshold = 200ll * 1024 * 1024 * 1024; /* minimum space free */ I undoubtedly put that 200GB value in there as a suitable threshold for my own 2TB drives, which may not be suitable for your drives (?). But if you specify a "move limit" mount option less than 100 (eg. -o mlimit=20), it's supposed to use the 10% rule instead of the 200GB threshold. -ml
Edited by mlord (01/11/2012 00:52)
|
Top
|
|
|
|
#356053 - 01/11/2012 01:13
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
new poster
Registered: 31/10/2012
Posts: 3
|
Thanks again for your replies mlord, much appreciated. The drive that contains the subdir im trying to write to is a 2TB. It has ~5GB left, and my test file is say 9GB. When I cp this file to my mhddfs mount, instead of writing the directory structure to another free drive and placing the file there it just fails with 'out of space' error.
If it helps my move size limit is at the default of 4gb
Edited by aTTila (01/11/2012 01:20)
|
Top
|
|
|
|
#356190 - 09/11/2012 12:04
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: aTTila]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
Yeah, so you are probably better off without that patch, then. It must have a bug in there of some kind.
Cheers
|
Top
|
|
|
|
#356806 - 14/12/2012 21:25
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
new poster
Registered: 14/12/2012
Posts: 2
|
Hi everybody I'm just new here, directed to this post after googling a little searching information about mhddfs. I'm building a NAS home server for storing mainly media files and, as mlord, I don't think a RAID setup is a good approach for this kind of server. Well, at least not a standard (hardware or software) RAID setup. The reasons I think that have been yet exposed on this thread, so I'm not going to repeat them, but suffice to say that, for a home server, the risk of losing ALL data in the array for any of the multiple ways this can happen on a RAID and the inability to mix mismatched drives is not affordable, IMHO. But, as I can't afford a duplicated storage setup, as mlord has for backup, I wanted some kind of redundance on mine, so I can recover without a full backup from a single drive fail (or maybe more). That's why I'm considering snapraid. I guess you know it, but just in case, take a look at: SnapRAIDOn top of that, I want also to be able to look at all my drives as a single storage pool, that's why I'm also thinking on using mhddfs. But it seems that mhddfs has a hit on performance and also a noticeable increase in cpu use. As I want also to use the same machine as a MythTV backend that should be able to record several channels at a time while streaming to several frontends, I'm a little concerned with anything hitting drives performance and/or cpu cycles availability. So, I was thinking on using the mhddfs mount just for exporting the drives pool through samba/nfs just for reading, and then using the separate drive mounts for writing (directly on the server, for the MythTV recordings or from exported samba/nfs shares). For what mlord has said, it seems that this is possible: And the individual drives (and files!) are still fully accessible when (or not) mounted as part of the mhddfs array. Part of the real beauty of it all, is that you don't have to do much to layer mhddfs onto the existing hodgepodge. Just tidy up the current directory structures if needed, so that each existing filesystem uses a common/compatible directory layout, and then fire up mhddfs to pool them all at a new mount point. The existing filesystems are still there, mounted, and 100% functional and 100% safe, but you then have the option to slowly migrate things to begin using the new mount point. mhddfs does not save any metadata of its own anywhere. It just redirects accesses one at a time as they happen, with no saved state. So it's easy to adopt over top of existing stuff, and just as easy to get rid of if one changes their mind. I know that this way I will lose the functionallity of automatically filling drives and then writting to the next one that mhddfs has, but having to manage disk occupation by myself is less important to me than losing drive or cpu performance. So, mlord, have you tested this scenario? Will it work? And, if it works, can I assume that this will not impact drives nor cpu performance, at least for write operations, as I will be using the individual drive mounts and not the pooled mhddfs mount for writting? And, as my last question, files written directly to one of the individual drive mounts will be shown immediately in the mhddfs pooled mount? Thanks in advance
|
Top
|
|
|
|
#356809 - 14/12/2012 21:37
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: xmodder]
|
carpal tunnel
Registered: 08/03/2000
Posts: 12338
Loc: Sterling, VA
|
I cannot comment on the topic, but I wanted to welcome you to the forum. Welcome!
_________________________
Matt
|
Top
|
|
|
|
#356810 - 15/12/2012 01:18
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: xmodder]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
I know nothing about SnapRaid. But my mythbackend box is the one using mhddfs, and it works swimmingly well without me having to fuss over it all. That box has a Core2duo CPU (2X @2.6MHz), and the minimal CPU hit of mhddfs has never been the slightest issue or concern. The NFS server in the box delivers over 100MBytes/sec to other machines on the GigE LAN from the mhddfs array. No form of RAID is a backup, and neither is mhddfs. Sure, it is far more dead-drive tolerant than any RAID, but I still recommend that everyone have at least one extra copy of their data (aka. a "backup"). Cheers
|
Top
|
|
|
|
#356811 - 15/12/2012 01:29
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
I should add that, on my mythtbackend box, the "Recordings" directory is set to a specific filesystem/drive of the array, not going through mhddfs. I believe you were asking about this specific point. It turns out that it *could* use mhddfs after all (performance is a non-issue), but I _like_ having the recordings all on one drive rather than spread over the array. My box holds mostly (static) video content rather than (dynamic) recordings.
|
Top
|
|
|
|
#357085 - 11/01/2013 10:05
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
new poster
Registered: 14/12/2012
Posts: 2
|
I should add that, on my mythtbackend box, the "Recordings" directory is set to a specific filesystem/drive of the array, not going through mhddfs. I believe you were asking about this specific point. It turns out that it *could* use mhddfs after all (performance is a non-issue), but I _like_ having the recordings all on one drive rather than spread over the array. My box holds mostly (static) video content rather than (dynamic) recordings. Ok, thanks a lot for your answer mlord. That's all I wanted to know. Actually I was thinking that a decent CPU should cope well with the overhead imposed by mhddfs, but just in case, as I have not tested it, I wanted to know if I could have an alternative in case writting to the mhddfs pool finally resulted in slow performance. MY NAS will be holding too mainly static video/audio content, and writes to it would be only downloads from a bittorrent client and recordings from mythtv server on the same machine. Also I want to write to this NAS from other machines on the network for copying new video and audio content from time to time, but it will be mainly used to stream live TV from mythtv server and for streaming the video and audio to XBMC clients on the network. So, I think this setup is very similar to yours, and then if it is working fine for you, it should work also fine for me. Best regards
|
Top
|
|
|
|
#357591 - 18/02/2013 17:06
Re: mhddfs -- Pooling filesystems together for bulk storage
[Re: mlord]
|
carpal tunnel
Registered: 29/08/2000
Posts: 14493
Loc: Canada
|
Yeah, so you are probably better off without that patch, then. It must have a bug in there of some kind. Well, the underlying filesystems are now full enough here that I can see that mhddfs actually is working to distribute files across both arrays. So perhaps there isn't "a bug in there" after all! Here's how both of the filesystems get mounted here. First, the main Mythtv storage heirarchy. For this, filesystem, capacity balancing isn't forced on until a drive dips below 400GB free. /usr/bin/mhddfs /drive[123] /drives -o allow_other,auto_cache,max_write=4194304,uid=1000,gid=1000,mlimit=400G
And here is the backup array, mounted only when needed. Capacity balancing is set to kick in even later here, at the 200GB per-member threshold: mhddfs /.bkp/drive[1234] /.bkp/drives -o rw,allow_other,auto_cache,max_write=4194304,uid=1000,gid=1000,mlimit=200G
|
Top
|
|
|
|
|
|