Unoffical empeg BBS

Quick Links: Empeg FAQ | RioCar.Org | Hijack | BigDisk Builder | jEmplode | emphatic
Repairs: Repairs

Topic Options
#259375 - 29/06/2005 17:05 Linux RAID 5 recovery
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
Well, the point that software raid only protects against a disk failure and not a power failure was driven home last night. I run a 3 disk raid 5 on a 2.6 kernel with the raidtools package. The machine suffered a power supply failure and when I brought it back up, recovery was needed.

The RAID 5 is hdf1 hdg1 and hdh1. The system complained about hdg1 not being "fresh" and so I told it to rebuild. This process started fine and so I went to sleep.

Next morning the server is stuck trying to rebuild, and looking back at the logs shows it encountered an I/O error in hdh1 and kicked it out of the RAID, during the recovery process.

So, now I have a RAID 5 with 1 known good disk, one with a failed resync, and one that still acts ok, except what looks like a bad block. Before I go off and run this lengthy rebuild tool from here, does anyone else know a way to just have the system bring the RAID 5 back online in a read only mode ignoring a block error or two?

I found some references to ckraid and some scanning/recovery switches it used to have. Newer versions got rid of this though in favor of moving automatic recovery in the kernel, leaving me with no way to manually try it seems.

The critical data from that RAID is backed up thankfully. However, that drive also contained my MP3 collection with no backup, so I'd much rather try for a recovery solution over reripping the music I just reripped 4 months ago.

Top
#259376 - 29/06/2005 17:39 Re: Linux RAID 5 recovery [Re: drakino]
eliceo
enthusiast

Registered: 18/02/2002
Posts: 335
A very similar thing happened to me months ago. I was using mdadm to manage the raid 5 array and I think it was with 5 drives. I was able to force the raid array to assemble without one drive. The data wasn't critical so I just started over. Hot spares are good, and UPSs with a monitoring utiily can save your data. I think similar functionality exists in raidtools?

Top
#259377 - 29/06/2005 19:55 Re: Linux RAID 5 recovery [Re: drakino]
Attack
addict

Registered: 01/03/2002
Posts: 598
Loc: Florida
You could try to do the PCB swap on the drive that isn't working long enough to rebuild the array, copy every thing off and then switch the PCB back and RMA the bad drive(s). I've done the PCB swap 4 times myself. Once on a friends TiVo.


If you can't get the raid to rebuild and you can get your hands on two or three spare drives you can use R-Studio (If you have a windows PC) to make an image of each drive. R-Studio then has an option to create a raid array from the images and restore the data.

Top
#259378 - 29/06/2005 21:11 Re: Linux RAID 5 recovery [Re: drakino]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
I found these instructions that I will be trying later. Currently I'm letting DD duplicate the "bad" hdh drive to a good one, ignoring errors. I'll then try the instructions there to get the raid back to an operational state and run a reiserfs scan to see what files are bad.

What is a good non destructive way of scanning for errors when a system is in operation? I'd like to have the system do surface scans from time to time, since it seems the error on the thrid drive might have existed for some time. Only when I started a resync did it read the bad area and trip up.

Top
#259379 - 29/06/2005 23:43 Re: Linux RAID 5 recovery [Re: drakino]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14478
Loc: Canada
cat /dev/md0 >/dev/null

??

Top
#259380 - 30/06/2005 06:32 Re: Linux RAID 5 recovery [Re: drakino]
drakino
carpal tunnel

Registered: 08/06/1999
Posts: 7868
Ok, the second sites procedure worked like a charm, and things are now being moved out of the RAID to get things back up. I'm pondering using this chance to upgrade the RAID size, since it was running low on space. Though most of it is due to me being lazy and not cleaning out some of the data hiding on the server that I really have no use for.

And mark, thanks for the suggestion. I'll probably do a dd ignoring errors on the pure /dev/hd[x] devices though to be able to sense errors on the exact drives, instead of catting the md device.

Top
#259381 - 30/06/2005 11:19 Re: Linux RAID 5 recovery [Re: drakino]
mlord
carpal tunnel

Registered: 29/08/2000
Posts: 14478
Loc: Canada
Quote:
I'll probably do a dd ignoring errors on the pure /dev/hd[x] devices though to be able to sense errors on the exact drives, instead of catting the md device.


Well in that case, use "smartctl -a -s on -t long /dev/hd[x]" for the surface scan. Way more accurate that way.

Cheers

Top