Classic one: bitten by RAID :( | Off Topic | unofficial empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

You are not logged in. [Log In] empegbbs.com » Forums » General Discussion » Off Topic » Classic one: bitten by RAID :(

Topic Options

#356509 - 25/11/2012 10:56 Classic one: bitten by RAID :(
julf veteran Registered: 01/10/2001 Posts: 1307 Loc: Amsterdam, The Netherlands	Ouch. How much more textbook can you get? Have 4-drive RAID server as main storage (and staging post for long-term backups - yes, RAID is not a backup, except temporarily). So last week I got a couple of SMART notices of recoverable errors on one of the disks. Time to replace it... Get new disk, pull old - but get bitten by the usual inconsistent mapping of logical to physical drives, so pull the wrong one. And having a brain fart, put it back. Rebuild triggered. Oh well, the failing disk hasn't failed yet, so just have to wait for rebuild to finish before replacing the right disk. Except... Yes, you know where this is going... At "96.7% complete", there is a non-recoverable bad sector and the second disk gets failed out of the array, with rebuild of array not completed. OK, now running ddrescue to recover as much as I can from the disk with bad sectors before I do anything else...
Top

#356512 - 25/11/2012 12:15 Re: Classic one: bitten by RAID :( [Re: julf]
mlord carpal tunnel Registered: 29/08/2000 Posts: 14536 Loc: Canada	Then patch the kernel (Linux, right?) to just ignore the bad sector and continue, instead of voiding the entire fricken array, and you can then recover nearly all of the data. Next time, use unRAID rather than RAID. Or mhddfs. Or something (anything) other than horribly unrobust RAID. It simply is not suitable for huge TB+ drives at home. Cheers
Top

#356514 - 25/11/2012 12:47 Re: Classic one: bitten by RAID :( [Re: mlord]
julf veteran Registered: 01/10/2001 Posts: 1307 Loc: Amsterdam, The Netherlands	Originally Posted By: mlord Then patch the kernel (Linux, right?) to just ignore the bad sector and continue, instead of voiding the entire fricken array, and you can then recover nearly all of the data. Yes, definitely tempted. But not entirely trivial (until now I have had no need to look at the kernel RAID code). Quote: Next time, use unRAID rather than RAID. Or mhddfs. Or something (anything) other than horribly unrobust RAID. It simply is not suitable for huge TB+ drives at home. Have to agree - it's the classic problem of starting out using something that worked OK under the then prevailing conditions, and then doing small upgrades without biting the bullet and replacing the thing...
Top

#356515 - 25/11/2012 13:18 Re: Classic one: bitten by RAID :( [Re: julf]
peter carpal tunnel Registered: 13/07/2000 Posts: 4182 Loc: Cambridge, England	Originally Posted By: julf pull the wrong one. And having a brain fart, put it back. Rebuild triggered Why a brain fart? It was game over then anyway, wasn't it? Unless your array has a "whoops, sorry, didn't mean to eject that" button, which seems unlikely, especially if any writes have happened in the interim, then even copying the degraded array off to a known good location would have failed 96.7% of the way through reading the duff disk. Peter
Top

#356516 - 25/11/2012 13:47 Re: Classic one: bitten by RAID :( [Re: peter]
julf veteran Registered: 01/10/2001 Posts: 1307 Loc: Amsterdam, The Netherlands	Originally Posted By: peter Why a brain fart? It was game over then anyway, wasn't it? Well, a very quick stopping/unmounting of the array might just have saved the situation, if there was no dirty buffers to write out... Also, the disk that got pulled was of course 100% OK at that point, it was just that the other disks considered the pulled disk as failed/unclean - patching that (instead of allowing reconstruction) would probably have been a way out.
Top

#356525 - 26/11/2012 14:41 Re: Classic one: bitten by RAID :( [Re: mlord]
julf veteran Registered: 01/10/2001 Posts: 1307 Loc: Amsterdam, The Netherlands	Originally Posted By: mlord Next time, use unRAID rather than RAID. Or mhddfs. I guess unRAID is not available just as a file system - you have to run the complete dedicated server/utility OS? I don't see how mhddfs solves the "reliable redundancy" issue. Might have to look into ZFS...
Top

#356541 - 26/11/2012 21:21 Re: Classic one: bitten by RAID :( [Re: julf]
mlord carpal tunnel Registered: 29/08/2000 Posts: 14536 Loc: Canada	Originally Posted By: julf I don't see how mhddfs solves the "reliable redundancy" issue. It doesn't. It solves the "make one big filesystem from a bunch of drives" problem, without losing everything when one drive goes bad. unRAID is similar, except they add a parity drive in parallel with the data drives, permitting loss of a single drive with no data loss. And loss of subsequent drives without losing everything. Yeah, pity unRAID wants to be standalone (or in a VM).
Top

#356546 - 27/11/2012 11:39 Re: Classic one: bitten by RAID :( [Re: mlord]
julf veteran Registered: 01/10/2001 Posts: 1307 Loc: Amsterdam, The Netherlands	Right, ZFS looks like the best solution right now. Anyway, I am a happy bunny - (g)ddrescue managed, after a couple of tries, to read all blocks off the failing disk. Replaced failing disk with ddrescued copy, forced a resync, and everything is hunky dory again. For now.
Top

#356548 - 27/11/2012 16:14 Re: Classic one: bitten by RAID :( [Re: julf]
drakino carpal tunnel Registered: 08/06/1999 Posts: 7868	Please share your experience with ZFS when you implement it. I've been thinking of doing similar here, but haven't implemented it yet. There is even a ZFS stack that has been evolving a bit for OS X that I'm following.
Top

#356552 - 27/11/2012 17:18 Re: Classic one: bitten by RAID :( [Re: drakino]
julf veteran Registered: 01/10/2001 Posts: 1307 Loc: Amsterdam, The Netherlands	Will do!
Top

#356553 - 27/11/2012 18:46 Re: Classic one: bitten by RAID :( [Re: drakino]
andy carpal tunnel Registered: 10/06/1999 Posts: 5919 Loc: Wivenhoe, Essex, UK	I haven't tried the OSX ZFS stack myself, but this is a quote from a friend who tried it back in June: "Currently shuffling all the data off my Zevo zfs formatted drive on my MBP so I can reformat it back with bad old HFS+ I can cause system crashes that are directly caused by Zevo not handling particularly intensive bouts of disk activity on large numbers of tiny files. Which pretty much describes how Aperture hits its database. Which is pretty mission critical for me - could do without any instability, least of all something that hits my photo databases." _________________________ Remind me to change my signature to something more interesting someday
Top

View All Topics