Debian/3ware 9500 - RAID Failure?

05
2014-04
  • Gargravarr

    I swapped the hard disks from my home server into a new case last night (new mobo/CPU/RAM) and transferred its 3ware 9500 PCI-X SATA RAID card with it. The machine has 4 disks configured in 2 RAID1s - root (500GB) and media (1TB). It runs Debian 7 32-bit.
    The machine booted up fine, but only when it was running, and the 3ware utilities had loaded, did I notice that one of the root disks was missing from the array. I shut it down and jiggled the disks around (I know this disk is tempermental, I need to buy a new one) and eventually got the RAID card to see it. This meant kicking off a RAID rebuild so I let the machine reboot into Debian so I could keep an eye on its progress and let the rebuild commence.
    It ran well into the 90% range before I had to go do something else. When I came back, disaster - the RAID card showed no RAIDs or disks present. Debian was still running (somehow!) but I couldn't do anything. The media volume was gone and the root FS seemed to be completely corrupt - bash was interpretting system binaries as random strings of numbers. There was nothing left to do but reboot (and that had to be a hard reboot, the shutdown command didn't work).
    The RAID card detailed all 4 disks present but that the problem disk was still Not In Use, meaning a manual RAID rebuild was still necessary. When Grub came up, it declared that it couldn't recognise the filesystems on either of my RAIDs so gave me the recovery shell - I have no idea how to use this (my internet was down last night, too) so I have no idea what state my disks are in. I removed the good root disk and tried to bring the system up in degraded mode on the tempermental disk to see if there was any usable data on it, but the RAID card refused to let me use it as a boot medium.
    If necessary I have a USB-SATA adapter, but I am wondering at this stage what my chances of recovering the system are; I know RAID's no substitute for backup but there's a lot of data on these disks that will take me a very long time to rebuild (that is, they're not irreplaceable, but I don't want to wipe the system and start afresh). Any ideas where I could start?

    Edit: made some progress. Looks like the 'good' disk out of the root pair suffered hardware failure midway through the rebuild! I tried to dd it to a new disk and got IO errors, and SpinRite doesn't want to touch it. The 'bad' disk is in some kind of limbo, but one of the partitions on there passed fsck and mounts in a live disc, so I'm dd'ing that to a spare disk. It doesn't explain why the other two media disks disappeared, but salvaging the /home partition is a great start.

    Edit 2: something very strange is going on here. The two media disks won't show up in the BIOS on my desktop, and via USB on my laptop they both show up with no partition table. I'm starting to wonder if somehow three of these four disks have died at once, or whether they've been killed by the new hardware - if so I can't work out why, a power surge is the only thing I can think of but that should have toasted the mobo first!

    Edit 3: further to my last comment, the undetected-disks issue is apparently due to the way the 3ware card uses them; the media disks show up okay with the 3ware controller, with their partition tables intact. fsck'ing them now, fingers crossed I can get to the data...

    Edit 4: I was able to salvage everything from the media disks, although I had to run an extensive fsck on the partition before mounting it. There were many errors with mismatching or invalid inodes and free space counts. The weird thing is that nothing should really have been using the disks at the time. okay, Plex Media Server was running, but since I hadn't touched any media on the drive, I don't think this could have messed with the state of the filesystem when the RAIDs went down. I'm going to try and get into the valid root disk, see what the logs say. Until I determine what caused the RAIDs to just vanish, I'm going back to software RAID.

  • Answers
  • Gargravarr

    My data was still on the disks. I have no idea what trashed them, but I'm now wary of hardware controllers. I've rebuilt the machine using software RAID and put all of the salvaged data back in place. I don't know where to start if anyone else has this problem.


  • Related Question

    Do newer Mobo's do RAID in a way that allows the disks to be viewed as a normal non-RAID HDD if transferred to another computer?
  • Lawrence Dol

    I built a new PC this weekend, and one of the joys I had was to transfer my RAID 1 array to the new system with a different mobo. The disks were not recognized at all, by either the mobo (no surprise) or Windows XP.

    Now with the disks running on the new system, out of curiosity I switched the BIOS from "RAID" to "AHCI" and booted into Windows. Much to my surprise, Computer Management showed the two HDDs as separate drives partitioned into 279 GB and 84 MB, and both as being healthy NTFS formatted disks. Neither were assigned a letter or mounted, and I didn't mount them because I didn't want to screw anything up.

    So my question is... is it possible that RAID 1 support on my new motherboard is formatting the primary partition on each HDD in a stock standard way, and storing any RAID specific data in the separate small partition? Can anyone say with any certainty?

    Mobo hardware is an MSI 790GX-G65, which is an AMD SB750 RAID chipset.

    PS: This matters to me because I was planning on upgrading to Windows 7 and switching to software RAID to avoid problems with loosing my data if my mobo fails and my backups are not completely up to date.


  • Related Answers
  • ta.speot.is

    is it possible that RAID 1 support on my new motherboard is formatting the primary partition on each HDD in a stock standard way, and storing any RAID specific data in the separate small partition?

    For RAID1, yes.

  • Scott McClenning

    I've always been told if you want compatibility or portability between systems or OSs, you need an external RAID card. The RAID cards are easier to replace or easier to find compatible chip sets.