recovery - mdadm always degraded on reboot - how to solve?
2014-07
Whenever I reboot, my RAID6 array is always degraded, and I have to force add two drives (seen here after re-adding):
Personalities : [raid6] [raid5] [raid4] [raid1] [raid10] [raid0] [linear] [multipath]
md127 : active raid6 sdf1[6] sdd1[4] sdb1[0] sdc1[5]
3907023872 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/2] [U__U]
[>....................] recovery = 0.4% (9160908/1953511936) finish=294.9min speed=109862K/sec
I'm not sure why this happens, but in a similar vein, one of my other (non-RAID) hard-drives is never mounted successfully either, and I have to mount it after startup. dmesg seems to report these hard-drives spinning up after all the basic file systems have loaded.
Anyone know what I should be trying?
SuperUser won't let me post the entirety of my dmesg,so here's a pastebin: http://bpaste.net/raw/179955/
I have an mdadm/lvm2 volume with 4 HDs that I created in Ubuntu 10.04. I just upgraded the computer to Ubuntu 10.10.
I redid the mdadm commands to get volume up and running, did mdadm --detail --scan > /etc/mdadm/mdadm.conf to get the configuration file.
But now, every time I reboot, it tells me that the volume is not ready. /proc/mdstat says that I always have one disk of the volume "inactive" as md_d127. I need to stop this volume and reassemble the whole thing to get it working.
This is what I get out of mdadm --detail --scan and put inside /etc/mdadm/mdadm.conf:
ARRAY /dev/md127 level=raid5 num-devices=4 metadata=01.02 name=:r0 UUID=7610a895:a54fe65b:c9876d2a:67f4a179
And this is my /proc/mdstat on boot:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : inactive sdb1[2](S) sdd1[0](S) sda1[4](S)
2930279595 blocks super 1.2
md_d127 : inactive sdc1[1](S)
976759865 blocks super 1.2
unused devices: <none>
I need to do mdadm -S /dev/md_d127, mdadm -S /dev/md127, mdadm -A --scan to get this volume working again.
What's going on? This did not happen with Ubuntu 10.04. I'm really fearing the loss of my raid5 data now.
The issue is that the updated version of mdadm
relies on the mdadm.conf
present in your initrd
, which is probably not accurate/complete. To verify its contents, do this:
gunzip -c /boot/initrd.img-2.6.38-11-generic | cpio -i --quiet --to-stdout etc/mdadm/mdadm.conf
If it doesn't contain accurate ARRAY entries, mdadm will try to use the name configured in the superblock as the link name under /dev/md/
, which will link to something like /dev/md127
. This obviously does not match the earlier behavior.
Rather than directly using mdadm -Ds
or mdadm -Es
to generate /etc/mdadm/mdadm.conf
, it's probably better to use the /usr/share/mdadm/mkconf
script:
sudo /usr/share/mdadm/mkconf force-generate /etc/mdadm/mdadm.conf
The most important step is to rebuild your initramfs to include the updated configuration:
sudo update-initramfs -u
Actually, thanks to the magic in /usr/share/initramfs-tools/hooks/mdadm
, /usr/share/mdadm/mkconf
will be run automatically if /etc/mdadm/mdadm.conf
does not exist or contains no arrays. If it exists and contains only a subset of your active arrays, a warning is displayed for each missing array, and you should manually generate a new mdadm.conf
.
I've resorted to reformatting the entire array. This works in Ubuntu 10.10.
sudo mdadm -C /dev/md0 -l 5 -n 4 -e 1.2 /dev/sd[bcde]1
sudo mdadm -Ds | sudo tee /etc/mdadm/mdadm.conf
sudo pvcreate /dev/md0
sudo vgcreate vg0 /dev/md0
sudo lvcreate vg0 --name lv0 --extents '100%FREE'
sudo mkfs.ext4 /dev/vg0/lv0
You may also check that udev is loading mdadm.
Look for /lib/udev/rules.d/85-mdadm.rules
; make sure that it has something like this:
\# This file causes block devices with Linux RAID (mdadm) signatures to
\# automatically cause mdadm to be run.
\# See udev(8) for syntax
SUBSYSTEM=="block", ACTION=="add|change", ENV{ID_FS_TYPE}=="linux_raid*", \
RUN+="/sbin/mdadm --incremental $env{DEVNAME}"
If not copy this into /etc/udev/rules.d/85-mdadm.rules
- NOTE /etc
NOT /lib
.
Please edit this
metadata=01.02
with
metadata=1.02
Because results from
#mdadm --detail --scan > /etc/mdadm/mdadm.conf
isn't completely correct.