The uncertainty is not if your hard drives will fail. The question is when will they fail? The answer is, at the absolute most inconvenient moment.
Contents
Linux Software Raid
This describes how to use Linux to configure inexpensive disks in redundant arrays without any proprietary hardware. The huge advantage of this technique is that there is no RAID controller to fail leaving your data stranded in the striping pattern that only that controller knows about.
Warning
|
If your motherboard has a RAID "feature" I would not use it. If your motherboard dies or any component rendering it inoperable, your data could easily be trapped until you track down a completely identical motherboard on eBay. Linux software RAID does not have this problem. |
Clearing Stupid Windows Cruft
So you’ve decommissioned an old Windows machine and you want to use it
in your proper Linux system. You dd
from /dev/zero
all over it and
are ready to format it. But then you find that there’s a weird and
tenacious entry in lsblk
or /dev/mapper/
belonging to it. How to
get Linux to stop worrying about it?
sudo dmsetup remove /dev/mapper/pdc_bebdacbjhg
Problem solved.
Setup
Set up partitions to use the partition type "fd" which is "Linux raid autodetect".
Boot Start End Blocks Id System
/dev/sda1 * 1 25 200781 fd Linux raid autodetect
/dev/sda2 26 269 1959930 82 Linux swap / Solaris
/dev/sda3 270 5140 39126307+ fd Linux raid autodetect
/dev/sda4 5141 10011 39126307+ fd Linux raid autodetect
Here I am making a boot partition, a swap partition and 2 identically sized data partitions. These can be made into RAID volumes with each other or across other physical drives. I leave the swap as swap because I don’t think it’s entirely useful to have swap space RAID protected.
Once you have the partition table the way you want it on one physical drive, you can clone it to another by doing something like this:
sfdisk -d /dev/sda | sfdisk /dev/sdb
Note
|
Double check that this trick works. I suspect that it may not with this exact syntax any more. |
You need to have the Linux RAID enabled in the kernel. From an install disk you might need to:
modprobe md-mod # Or simply `modprobe md`
modprobe raid1
(Or whatever raid level you’re after). To get this module configure here:
Device_Drivers->Multi-device_support_(RAID_and_LVM)->RAID_support->RAID-1_(mirroring)_mode
You might also need this if it’s not already there:
# emerge -av sys-fs/mdadm
Now set up some md devices (md=multi disk, I think):
livecd ~ # mknod /dev/md1 b 9 1
livecd ~ # mknod /dev/md2 b 9 2
livecd ~ # mknod /dev/md3 b 9 3
livecd ~ # mknod /dev/md4 b 9 4
Time to actually setup the raid devices:
livecd ~ # mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
mdadm: size set to 200704K
mdadm: array /dev/md1 started.
livecd ~ # mdadm --create --verbose /dev/md3 --level=1 --raid-devices=2 /dev/sda3 /dev/sdb3
mdadm: size set to 39126208K
mdadm: array /dev/md3 started.
livecd ~ # mdadm --create --verbose /dev/md4 --level=1 --raid-devices=2 /dev/sda4 /dev/sdb4
mdadm: size set to 39126208K
mdadm: array /dev/md4 started.
Note
|
If you’re making a RAID1 set which has only one partition which needs
to be bootable, you might think about including --metadata=1.0 so
that the RAID’s meta data does not conflict with the partition’s.
Apparently
this can
cause all kinds of booting problems and incompatibilities. It might
just be safer to have a boot partition that is not RAID and mirror it
manually. This is reasonable since it shouldn’t be too dynamic and
will guarantee a booting machine regardless of the kernel’s ability to
auto assemble RAID during initialization. |
Here’s an example of a serious RAID setup:
mdadm --create --verbose /dev/md3 --level=5 --raid-devices=22 /dev/sdc \
/dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj \
/dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq \
/dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx
Here’s one with a hot spare:
mdadm --verbose --create /dev/md3 --level=5 --raid-devices=7
--spare-devices=1 /dev/sdd /dev/sde /dev/sdf /dev/sdc /dev/sdg
/dev/sdi /dev/sdj /dev/sdh
Note that if the RAID volumes were already set up, then just make the devices with mknod and then instead of "creating" the RAID components, just re"assemble" them like this:
mdadm --assemble --verbose /dev/md4 /dev/sda4 /dev/sdb4
Note that if the RAID volumes were already set up and there are redundant disks, it might happen that the RAID volume starts without its redundant mirror. For example, if you find:
md1 : active raid1 sda1[0]
521984 blocks [2/1] [U_]
And you know that it should be being mirrored with /dev/sdb1, then do this:
# mdadm /dev/md1 --add /dev/sdb1
And then look for this:
md1 : active raid1 sdb1[1] sda1[0]
521984 blocks [2/2] [UU]
This seems to finish right away, but actually takes a long time to get established properly. Presumably this can be done with a drive full of data and this process copies everything.
Checking
You can check up on a running array with this.
watch -n 1 cat /proc/mdstat
Note that a better way to get more information about what is going on with a RAID array is through mdadm:
mdadm --query --detail /dev/md3
Or
mdadm --misc --detail /dev/md0
For information about the components of a RAID set use examine.
mdadm --misc --examine /dev/sd[ab]
To stop a RAID array basically means to have the system stop worrying about it:
mdadm --misc --stop /dev/md3
This is useful if you start an array but something goes wrong and you need to reuse the disks that are part of the aborted or inactive or unwanted array.
Since this is Linux, you can get an absurd amount of information about
the status of your RAID volume. Check out the virtual
/sys/block/md1/...
directory tree and explore very fine details of
the setup. Read about what all that good stuff is in the
kernel’s md.txt
documentation.
# cat /sys/block/md1/md/array_state
clean
Once you’re comfortable with checking on the status of the RAID, a good thing to do is to actively test it for behavior under stress. This is a good guide to testing RAID setups.
mdadm.conf
For temporary purposes, you can setup a /etc/mdadm.conf file like this:
mdadm --detail --scan > /etc/mdadm.conf
but for a more permanent installation (once the OS is installed), this is a good format for this file:
DEVICE /dev/sda*
DEVICE /dev/sdb*
ARRAY /dev/md1 devices=/dev/sda1,/dev/sdb1
#ARRAY /dev/md2 devices=/dev/sda2,/dev/sdb2
ARRAY /dev/md3 devices=/dev/sda3,/dev/sdb3
ARRAY /dev/md4 devices=/dev/sda4,/dev/sdb4
MAILADDR sysnet-admin@sysnet.ucsd.edu
livecd ~ # mkfs.ext2 /dev/md1
livecd ~ # mkswap /dev/sda2
livecd ~ # mkswap /dev/sdb2
livecd ~ # swapon /dev/sd[ab]2
livecd ~ # mkfs.ext3 /dev/md3
livecd ~ # mkfs.ext3 /dev/md4
emerge grub
grub --no-floppy
Setup MBR on /dev/sda:
root (hd0,0)
setup (hd0)
Setup MBR on /dev/sdb:
device (hd0) /dev/sdb
root (hd0,0)
setup (hd0)
quit
Note
|
Gentoo people should see if grub needs to be compiled with the
device-mapper USE flag. Gentoo people should also emerge -avuD
mdadm at some point. |
Don’t forget that your kernel will want a root=/dev/md3
parameter.
Recovery
So imagine that you have some error like this:
[hb.xed.ch][~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdd2[1] sdc2[0]
1951808 blocks [2/2] [UU]
md2 : active raid1 sdd3[1]
78268608 blocks [2/1] [_U]
md0 : active raid1 sdd1[1] sdc1[0]
192640 blocks [2/2] [UU]
The second partition on the first disk is dead and not being used. Install a new disk which will then show unused partitions for the blank drive.
The partition table has to be recreated. Make sure you get this right!!
:-> [swamp][~]$ fdisk -l /dev/sdc
Disk /dev/sdc: 82.3 GB, 82348277760 bytes
255 heads, 63 sectors/track, 10011 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id System
/dev/sdc1 * 1 25 200781 fd Linux raid autodetect
/dev/sdc2 26 269 1959930 82 Linux swap / Solaris
/dev/sdc3 270 5140 39126307+ fd Linux raid autodetect
/dev/sdc4 5141 10011 39126307+ fd Linux raid autodetect
:-> [swamp][~]$ fdisk -l /dev/sdd
Disk /dev/sdd: 82.3 GB, 82348277760 bytes
255 heads, 63 sectors/track, 10011 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/sdd doesn't contain a valid partition table
From this you can see that hdd is the blank drive - no doubt about it. So don’t copy sdd’s table to sdc; that would be bad. Note also that the drives are identical size. This is good. In fact, it might be smart to partition to begin with with a few percent not used to account for the difference between various brands in a nominal size.
Rebuild the partition table using the same technique used at installation:
sfdisk -d /dev/sdc | sfdisk /dev/sdd
Double check:
fdisk -l /dev/sdc; fdisk -l /dev/sdd
Set up swap space on the other volume and go ahead and use it:
:-> [swamp][~]$ mkswap /dev/sdd2
Setting up swapspace version 1, size = 2006962 kB
no label, UUID=81258423-4245-431b-8fb9-137e9651e7dd
:-> [swamp][~]$ swapon /dev/sdd2
Resetting Old RAID Partitions
Sometimes you might run one drive for a while and then introduce it’s matching RAID mirror and the mirror will spontaneously associate with an md device. Here’s how to clear that before adding it back to the real md set.
If I add hdc1 and it shows an md0 that shouldn’t exist, like this:
md0 : active raid1 hdc1[1]
104320 blocks [2/1] [_U]
Mark it as failed:
# mdadm --manage --fail /dev/md0
Or if it’s running and you don’t want it to be:
# mdadm --manage --stop /dev/md0
And then:
# mdadm --manage --remove /dev/md0
This actually didn’t seem to work! I just fixed the mdadm.conf and rebooted. Maybe I needed to:
# mdadm --zero-superblock /dev/sda
# mdadm --zero-superblock /dev/sdb
Sometimes none of this works! You have to use dd. You can overwrite the whole thing but try this first.
DEVICE=/dev/sdn
SECTORS=$(blockdev --getsz $DEVICE)
dd if=/dev/zero of=${DEVICE} bs=512 seek=$(( ${SECTORS} - 1024 )) count=1024
Do the recovery mirroring
Do the small easy one first:
:-> [swamp][~]$ mdadm --manage /dev/md1 --add /dev/sdd1
mdadm: added /dev/sdd1
Now check to see that it’s working:
:-> [swamp][~]$ cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdd1[2] sdc1[0]
200704 blocks [2/1] [U_]
[===================>.] recovery = 98.4% (198912/200704) finish=0.0min speed=39782K/sec
Check again to see it complete:
:-> [swamp][~]$ cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdd1[1] sdc1[0]
200704 blocks [2/2] [UU]
Do the rest:
:-> [swamp][~]$ mdadm --manage /dev/md3 --add /dev/sdd3
:-> [swamp][~]$ mdadm --manage /dev/md4 --add /dev/sdd4
These things go sequentially, so don’t panic if one waits for the other to finish. If you issue these commands, they’ll work eventually.
You can keep an eye on it:
watch cat /proc/mdstat
That might look like:
raven ~ # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 hdc1[1] hda1[0]
104320 blocks [2/2] [UU]
md3 : active raid1 hdc3[2] hda3[0]
9775488 blocks [2/1] [U_]
[===>.................] recovery = 18.0% (1767424/9775488) finish=4.8min speed=27328K/sec
md4 : active raid1 hdc4[2] hda4[0]
145910272 blocks [2/1] [U_]
resync=DELAYED
unused devices: <none>
Go ahead and put the bootloader in the MBR on the new drive using the procedure outlined above.
Note
|
If you’re thinking about replacing anything while powered up, check out this information about what hardware is ok with hot swapping. |
Mounting a partition that used to be in a RAID set
Maybe you’ve decommissioned a pair of mirrored RAID drives and you’re thinking of reusing them but you want to see what was on them to make sure it’s nothing important. Here is how to simply mount this drive on a different Linux box. You can check to see that the drive is plugged in and recognized somewhere with:
$ cat /proc/partitions
major minor #blocks name
3 0 156290904 hda
3 1 104391 hda1
3 2 498015 hda2
22 0 156290904 sdb
22 1 104391 sdb1
22 2 498015 sdb2
There it is, sdb. But if you try:
$ mount /dev/sdb/1 /mnt/b/1
You get:
mount: unknown filesystem type 'mdraid'
It doesn’t work like that. You need to make sure all your kernel modules are happy for RAID (or have them compiled in works too):
# modprobe md
# modprobe raid1
The next thing to have ready is mdadm
. This can be installed with
apt install mdadm
but I found that it installed a lot of serious
cruft that I didn’t want (email stuff, MySQL, etc). I found that to
just get a quick easy mdadm this was effective.
apt install libudev-dev
git clone git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
cd mdadm
make
./mdadm --version
If that’s good and mdadm
is installed then you can do:
# sudo mdadm --assemble --run /dev/md1 /dev/sdb1
# sudo mount /dev/md1 /mnt/raid
And you’re looking at the contents.
md127 WTF?
Recently I’ve been having a lot of trouble with RAID volumes showing
up with names like /dev/md127. This seems especially problematic with
making RAID volumes which are for a Gentoo system made with sysresccd.
I make the volumes something sensible like dev/md1
and 1. it doesn’t
boot into the system and 2. when analyzed in a subsequent boot of
sysresccd, they show up with the strange device names.
This could have something to do with:
-
…misunderstanding the kernel’s whole approach to RAID. Here is the kernel’s official md documentation. It may be unnervingly out of date in some subtle but important places.
-
…metadata problems. Used to be 0.90 but now it’s 1.2 by default. Use
--metadata=0.90
to specify explicitly during creation. Other options are 1.0 and 1.1. GRUB Legacy is apparently only ok with 0.90, while GRUB2 has amdradi1x
"module" which accommodates 1.x metadata versions. Note that metadata is stored in the per-device superblocks. What exactly is in the superblocks? Here is a reference. Usecat /sys/block/md1/md/metadata_version
to check. -
…sysresccd and it being older or newer than some other component. From the
mdadm
man page: "When creating a partition based array, using mdadm with version-1.x metadata, the partition type should be set to 0xDA (non fs-data). This type selection allows for greater precision since using any other [RAID auto-detect (0xFD) or a GNU/Linux partition (0x83)], might create problems in the event of array recovery through a live cdrom." Note that this might not be an issue if there are no partitions at all and the device (/dev/sda) is used. -
…the
--super-minor=
option. This only applies to 0.90 metadata RAID sets. -
…the
--uuid=
option. -
…not properly specifying a
--name
and--homehost
during RAID creation. Seems like old RAID used to make some assumptions about things and new (metadata) RAID encodes more explicit identification.mdadm --verbose --create --auto=yes /dev/md/os --name=os \ --metadata=default --raid-devices=2 --level=1 /dev/sda /dev/sdb
mdadm --verbose --create /dev/md0 --level=1 --raid-devices=2 \ --metadata=0.90 /dev/sda1 /dev/sdb1
The exact problem seems to be described here as:
For version 1.2 superblocks, the preferred way to create arrays is
by using a name instead of a number. For example, if the array is
your home partition, then creating the array with the option
--name=home will cause the array to be assembled with a random
device number (which is what you are seeing now, when an array
doesn't have an assigned number we start at 127 and count
backwards), but there will be a symlink in /dev/md/ that points to
whatever number was used to assemble the array. The symlink in
/dev/md/ will be whatever is in the name field of the superblock.
So in this example, you would have /dev/md/home that would point
to /dev/md127 and the preferred method of use would be to access
the device via the /dev/md/home entry.
Fun fact: I had given up on my previous setup and was going to redo the entire RAID. One of the changes I planned to make was to not use partitions and try to just make a RAID volume from a couple of complete drives (/dev/sd[ab]). What shocked me was that when I created the array as such, it started mirroring the data that was on one of the drives which was left over from a set that I’d previously installed an OS on using /dev/sda1 and /dev/sdb1. The fact that this was possible and automatic is either extremely wonderful or unnerving.
Sysresccd Problems
Sadly, I believe that sysresccd has been responsible for some of the problems I’ve been experiencing. It seems that it aggressively tries to automatically assemble RAID volumes and that’d be fine except that it seems to corrupt them in certain cases.
Knoppix
modprobe md
modprobe raid1
fdisk /dev/sda <= Linux raid autodetect
fdisk /dev/sdb <= Linux raid autodetect
mdadm --verbose --create --level=1 --raid-devices=2 \
--metadata=0.90 /dev/md0 /dev/sda1 /dev/sdb1
watch cat /proc/mdstat # and wait until finished.
mkfs.ext3 -v -L RAID1VOL /dev/md0
mount /dev/md0 /mnt
ls -al /mnt
date > /mnt/integrity_check
cd ~ ; umount /dev/md0
mdadm --stop /dev/md0
shutdown -r now
*REBOOT*
modprobe md
modprobe raid1
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 # /dev/md0 auto created
mkdir /mnt/raid /mnt/backup
mount /dev/md0 /mnt/raid
cat /mnt/raid/integrity_check; rm /mnt/raid/integrity_check
mount /dev/sdd3 /mnt/backup # Find all your data again.
time rsync -a /mnt/backup/ /mnt/raid/
Alternatives To SysrescCD
-
Knoppix What? 32bit only? the knoppix64 image (ubnkern) didn’t run a 64 bit kernel. Why? Damn shame since this had a decent environment for Gentoo/RAID setup.
-
The Official Gentoo LiveDVD - 3.6GB if you have room to waste.
-
Debian LiveCD Note: use
sudo su -
to avoid going insane. Had to edit the kernel options a bit too, chooseLive (amd64 failsafe)
and get rid ofmemtest
(which causes hanging).
SysRescCD's problems are discussed here and here and here.
Using Debian Live
F8
Memorex
Live <TAB>
memtest^H^H^H^H^H^H^H
sudo su -
cd /dev; mdadm -vA md0 sd[ab]1
mount /dev/md0 /mnt
mount -t proc none /mnt/proc
mount --rbind /dev /mnt/dev
chroot /mnt /bin/bash
OMG IT WORKED!
I used a debian-live-7.2-amd64-rescue.iso to set up a RAID1 with 0.90
metadata on 2 devices with partitions (i.e. /dev/sda1 and /dev/sdb1)
of type fd
. This was readable by Legacy Grub and the kernel was able
to assemble it at boot without an initramfs.
Can Sysresc be used with RAID at all?
Sysresc is definitely aggressive about building RAID sets if it thinks there are any devices that could be assembled. In some cases this would be good but when you’re trying to erase one or set one up or both, it’s annoying.
It looks like RAID sets created with sysresc create a "homehost"
called sysresccd
and then on boot (to sysresc) it thinks that it is
on it’s home host and tries autoassembly. This is with normal (now,
post 2010) metadata version 1.2.
The first thing to do is to try to use the --stop
function of
mdadm
as described. Also the --remove
. What I found is that
although /proc/mdstat
was empty, when I configured a drive with
fdisk
to have the fd
type, suddenly a RAID volume had been
assembled (badly). I found that doing udevadm hwdb --update
can
help.
mdadm --verbose --manage --stop /dev/md127
cat /proc/mdstat # make sure it is gone
mdadm --verbose --create /dev/md0 --homehost=anyhomehost --name=ssdset \
--level=1 --raid-devices=2 /dev/sda /dev/sdb
watch cat /proc/mdstat # Waiting until finished is probably smart.
reboot
mdadm --verbose --manage --stop /dev/md127
mdadm --verbose --assemble /dev/md0 /dev/sda /dev/sdb
mdadm --misc --examine /dev/sd[ab]
mdadm --misc --detail /dev/md0
Should be good to mkfs
and mount
. On reboot, stop the bogus
/dev/md127
or whatever it is and assemble correctly and everything
should be there.
USE="static" ebuild /usr/portage/sys-fs/mdadm/sys-fs/mdadm-3.2.6-r1.ebuild compile
Then look for a static executable in
/var/tmp/portage/sys-fs/mdadm-3.2.6-r1/work/mdadm-3.2.6
Using Hardware RAID On Linux with 3Ware Cards
Notes about hardware Raid using a particular 3Ware card (that I don’t use any more but maybe this information will still be useful).
Replacing a drive after a failure
First find the problem. Use a command much like this:
:-> [hb][~]$ /sbin/tw_cli /c0 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-5 OK - - 64K 4656.51 ON OFF
u1 SPARE OK - - - 465.753 - OFF
Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 465.76 GB 976773168 WD-WCANU1306978
p1 OK u0 465.76 GB 976773168 WD-WCANU1241597
p2 OK u0 465.76 GB 976773168 WD-WCANU1230955
p3 OK u0 465.76 GB 976773168 WD-WCANU1222179
p4 OK u0 465.76 GB 976773168 WD-WCANU1318737
p5 OK u0 465.76 GB 976773168 WD-WCANU1230683
p6 OK u0 465.76 GB 976773168 WD-WCANU1240889
p7 OK u0 465.76 GB 976773168 WD-WCANU1234675
p8 OK u0 465.76 GB 976773168 9QG0DRJH
p9 SMART-FAILURE u1 465.76 GB 976773168 WD-WCANU1231205
p10 OK u0 465.76 GB 976773168 WD-WCANU1255530
p11 OK u0 465.76 GB 976773168 WD-WCANU1059605
This shows that drive 9 is showing signs of flakiness and though it may work great, it may not and should be replaced. It is likely that if there is a problem drive, it will be isolated on it’s own RAID unit as the controller will grab the good drive from the hot spare and start using it.
Now extract the bad drive. This can be very confusing. I have proven twice this month that the labels that are on hb are good. There are no labels on puzzlebox. It’s very important to pull the right drive, especially when things are bad, and this can be confusing, so be careful.
:-> [hb][~]$ /sbin/tw_cli /c0 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-5 OK - - 64K 4656.51 ON OFF
Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 465.76 GB 976773168 WD-WCANU1306978
p1 OK u0 465.76 GB 976773168 WD-WCANU1241597
p2 OK u0 465.76 GB 976773168 WD-WCANU1230955
p3 OK u0 465.76 GB 976773168 WD-WCANU1222179
p4 OK u0 465.76 GB 976773168 WD-WCANU1318737
p5 OK u0 465.76 GB 976773168 WD-WCANU1230683
p6 OK u0 465.76 GB 976773168 WD-WCANU1240889
p7 OK u0 465.76 GB 976773168 WD-WCANU1234675
p8 OK u0 465.76 GB 976773168 9QG0DRJH
p9 DRIVE-REMOVED - - - -
p10 OK u0 465.76 GB 976773168 WD-WCANU1255530
p11 OK u0 465.76 GB 976773168 WD-WCANU1059605
This is as expected and now the hot spare unit (u1) goes away.
Put a new drive in. Much easier than step #2 - don’t forget to bring a Phillips screwdriver to the machine room, by the way.
:-> [hb][~]$ /sbin/tw_cli /c0 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-5 OK - - 64K 4656.51 ON OFF
Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 465.76 GB 976773168 WD-WCANU1306978
p1 OK u0 465.76 GB 976773168 WD-WCANU1241597
p2 OK u0 465.76 GB 976773168 WD-WCANU1230955
p3 OK u0 465.76 GB 976773168 WD-WCANU1222179
p4 OK u0 465.76 GB 976773168 WD-WCANU1318737
p5 OK u0 465.76 GB 976773168 WD-WCANU1230683
p6 OK u0 465.76 GB 976773168 WD-WCANU1240889
p7 OK u0 465.76 GB 976773168 WD-WCANU1234675
p8 OK u0 465.76 GB 976773168 9QG0DRJH
p9 OK - 465.76 GB 976773168 WD-WCAS81281763
p10 OK u0 465.76 GB 976773168 WD-WCANU1255530
p11 OK u0 465.76 GB 976773168 WD-WCANU1059605
So now drive 9 is back in and it seem ok, but it’s not part of any unit. It’s basically doing nothing useful.
Assign the new drive to a hot spare group. This is actually the important and non-obvious part of the operation.
:-> [hb][~]$ /sbin/tw_cli /c0 add type=spare disk=9
Creating new unit on controller /c0 ... Done. The new unit is /c0/u1.
:-> [hb][~]$ /sbin/tw_cli /c0 show
Unit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-5 OK - - 64K 4656.51 ON OFF
u1 SPARE OK - - - 465.753 - OFF
Port Status Unit Size Blocks Serial
---------------------------------------------------------------
p0 OK u0 465.76 GB 976773168 WD-WCANU1306978
p1 OK u0 465.76 GB 976773168 WD-WCANU1241597
p2 OK u0 465.76 GB 976773168 WD-WCANU1230955
p3 OK u0 465.76 GB 976773168 WD-WCANU1222179
p4 OK u0 465.76 GB 976773168 WD-WCANU1318737
p5 OK u0 465.76 GB 976773168 WD-WCANU1230683
p6 OK u0 465.76 GB 976773168 WD-WCANU1240889
p7 OK u0 465.76 GB 976773168 WD-WCANU1234675
p8 OK u0 465.76 GB 976773168 9QG0DRJH
p9 OK u1 465.76 GB 976773168 WD-WCAS81281763
p10 OK u0 465.76 GB 976773168 WD-WCANU1255530
p11 OK u0 465.76 GB 976773168 WD-WCANU1059605
Now the controller knows that this is a hot spare. Everything looks good again.
Immediately go buy a replacement! For hb, there’s one in my office that should not be used for anything but hb and there’s one in the machine room locker which the full sys admin team can get at in my absence. They should both be there!
Bonus step: send the dead drive in to the manufacturer if it’s still under warranty. Can’t have too many of these things lying around.
Using Hardware RAID On Linux with Areca Cards
cli64
I usually put this utility at /root/cli64
.
To get help do this.
/root/cli64 help
/root/cli64 vsf -h
Show status of current raid environment. This is good for diagnostic checks.
disk info
Note that drive info
also may work, but I’ve found that it sometimes
does not so disk info
should be considered canonical.
If there is a failure you probably need to pull that failed disk out. It is important to get the correct number. The proper number for finding the physical disk is in the column labeled "Slot#".
There is a password protection. I generally have no need for it and will set it to "0000". This command must be issued in the session that requires password clearance. Just do this before proceeding.
set password=0000
However, I did confirm that the password does not need to be set to do a volume check. To check raid consistency in a volume (and possibly to trigger any deep I/O errors) try this.
vsf check vol=3
vsf info
vsf stopcheck
Note that the number seems to be the #
when you do vsf info
. So
"1" for "vs0", and "3" for "vs2". Using vsf info
will show you the
progress of a running check.
CLI> vsf info
# Name Raid Name Level Capacity Ch/Id/Lun State
===============================================================================
1 vs0 rs0 Raid5 12000.0GB 00/00/00 Normal
2 vs1 rs1 Raid5 12000.0GB 00/00/01 Normal
3 vs2 rs2 Raid5 12000.0GB 00/00/02 Checking(0.2%)
===============================================================================
GuiErrMsg<0x00>: Success.
To set auto activation of an incomplete RAID. 1 is on, 0 is off. Unfortunately, this doesn’t always work.
sys autoact p=1
To see what is going on with a "raid set":
rsf info raid=3
If you pull the bad drive and replace it, you might get a "Free"
status. This doesn’t help anything. Either your previous Hot Spares
are hard at work now becoming primary working drives or you will be
activating them to do so. Either way, you want that "Free" drive to be
a Hot Spare (assuming you weren’t dumb enough to set up a system
without a hot spare). To do this you need the following command with
the number of the drive. The number is important to get right and
confusing. The disk info
command has a column labeled "CLI> #", the
first column. Use this number to specify which drive to turn into a
hot spare. This is different (probably) from which bay to pull a
physical drive out of for a failure. For example, this is to turn the
drive in physical by 17 into a hot spare.
rsf createhs drv=25
Sometimes there is a failure and the raid set just sits there. I think this tends to happen when the drive fails during a reboot (which is a time slightly more prone to failure). Here is a sequence where I check the raid set which contained a "Failed" that was replaced and turned to "Free". That is not ok since you actually want the drive’s status to be "rs2" or whatever raid set is correct and the rsf info’s "Raid Set State" to be "Rebuilding".
CLI> rsf info raid=3
Raid Set Information
===========================================
Raid Set Name : rs2
Member Disks : 7
Total Raw Capacity : 14000.0GB
Free Raw Capacity : 14000.0GB
Min Member Disk Size : 2000.0GB
Raid Set State : Incompleted
===========================================
GuiErrMsg<0x00>: Success.
CLI> rsf activate raid=3
GuiErrMsg<0x00>: Success.
CLI> rsf info raid=3
Raid Set Information
===========================================
Raid Set Name : rs2
Member Disks : 7
Total Raw Capacity : 14000.0GB
Free Raw Capacity : 0.0GB
Min Member Disk Size : 2000.0GB
Raid Set State : Rebuilding
===========================================
GuiErrMsg<0x00>: Success.
How’s that rebuild going? Check with:
`cli64 vsf info`
If there are problems, check the log:
`cli64 event info`
Areca support likes to see what you’re running:
`cli64 sys info`
`cli64 sys showcfg`
Note
|
that if a drive fails during a reboot, the raid card doesn’t know what to make of the situation. So instead of automatically rebuilding from a hot spare, it will just do nothing. You have to activate it as shown above. This means that whenever the file server is booted, a drive status report should be generated to make sure everything is starting out properly. |
Note
|
on fs11 (data) when the array contained a failed drive all red
lights were on. Removing the bad drive turned all the LEDs off. Also I
couldn’t get drive identify drv=? to work. However on this
particular machine some forward thinking person (me) had labeled all
the bays the correct "Slot#". Finally, when I put the new drive in
with 7 drive already functioning as "Normal" RAID set, the new drive
seems to automatically have been configured as a "HotSpare". |
Using Hardware RAID On Linux with LSI Cards (Dell)
A typical Dell server has the following RAID controller.
RAID bus controller:
LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)
Downloads
Download manual and MegaCLI program at:
Put "MegaCLI" in the "keyword" field of the search (nothing else).
Look for the one at the top called MegaCLI 5.5 T2
. I think the rest
are not useful or necessary. This creates a file called
8-07-14_MegaCLI.zip
which contains binaries for many platforms. The
Linux one is really an RPM (good job with the double compression).
The RPM installs the following files.
/opt/MegaRAID/MegaCli/MegaCli
/opt/MegaRAID/MegaCli/MegaCli64
/opt/MegaRAID/MegaCli/libstorelibir-2.so.14.07-0
If trying to use this on a (64bit!) SysrescCD (or some non Red Hat system) use
rpm -ivh --nodeps MegaCli-8.07.14-1.noarch.rpm
This will put the correct files in place in /opt
.
Or to just extract the files you need directly with no RPM installation do something like this.
rpm2cpio MegaCli-8.07.14-1.noarch.rpm | cpio --extract --list "
rpm2cpio MegaCli-8.07.14-1.noarch.rpm | cpio --extract --make-directories --verbose "
Usage
This general summary from Cisco shows a lot of good usage tips for MegaCLI including the replacing of a drive.
Enclosure Info
sudo /opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aALL
Note the field here "Device ID" which in this case is "32". This number can be used in the "E" place when drives are specified with the "[E:S]" syntax.
Battery Backup Unit Check
sudo /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aALL
Configuration Settings
sudo /opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -aALL
Adapter Info (Entire Device)
sudo /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL
This specifically looks interesting.
sudo /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL | grep 'Errors *:'
Though I don’t know exactly what it means.
Logical Disk Checks
sudo /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL
Physical Device Checks
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDInfo -PhysDrv '[32:7]' -aALL
In the last one, the 32 is the "enclosure" number and the 7 is the slot. This basically shows all info about slot 7.
Replacing
Set state to offline. Note that this may just occur naturally after replacing a drive. If the drive is offline, see the next command.
/opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -PhysDrv '[32:7]' -aN
I did this and the hot spare’s firmware state went to "Rebuild"
Mark as missing is probably only useful if the drive really is missing. Skip this if you’re just planning a drive swap
/opt/MegaRAID/MegaCli/MegaCli64 -PDMarkMissing -PhysDrv[32:7] -a0
Prepare for removal did seem to work and I believe it spins down the drive so you’re not yanking out a spinning drive. I believe that it also turns off the bottom LED on the enclosure.
/opt/MegaRAID/MegaCli/MegaCli64 -PdPrpRmv -PhysDrv[32:7] -a0
Blink the LED of the drive. It might turn out that the light is already blinking so this gives no peace of mind. Try blinking the neighbors to home in on exactly the bay you think you’re working with.
/opt/MegaRAID/MegaCli/MegaCli64 -PDLocate -PhysDrv '[32:1]' -aN
/opt/MegaRAID/MegaCli/MegaCli64 -PDLocate stop -PhysDrv '[32:1]' -aN
Adapter: 0: Device at EnclId-32 SlotId-1 --
PD Locate Start Command was successfully sent to Firmware
Replace missing drive only if there’s no hot spare taking over i.e. the array is degraded and needs to be fixed with this new drive.
/opt/MegaRAID/MegaCli/MegaCli64 -PdReplaceMissing -PhysDrv[32:7] -ArrayN -rowN -aN
If there is a hot spare that has just switched over when you took the bad actor offline (or it happened automatically), then this new drive will need to become the hot spare. That' is described below in the Hot Spare section.
I don’t know what the N variables here are. I didn’t do any of this. I just yanked the bad drive out and started from here.
Here’s what I did do. After physically replacing the drive its PDInfo can be found like this (this example is for enclosure 32, drive 7, adaptor 0).
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDInfo -PhysDrv[32:7] -a0
Look for "Firmware state:" which might be rebuilding. Eventually it stops at "offline". This makes the logical disk information show the set as "degraded" still. Check for that with this.
sudo /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -L1 -a0
If the drive says "offline" and you want it to be participating in the RAID do something like this.
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDOnline -PhysDrv[32:7] -a0
Note that this is the syntax that worked for me (no quoting of the brackets even). This produced this message.
EnclId-32 SlotId-7 state changed to OnLine.
Once you have all the drives "online" check to see if the State is still "degraded". The State should be "Optimal".
See page 254 of the manual for more information.
Hot Spare
When I first got to the RAID array on example.edu, drive number 11 had the following condition.
Slot Number: 11
Firmware state: Unconfigured(good), Spun Up
At first I didn’t understand what this meant and figured it might be possible that it could be automatically used as a hot spare in this condition. However, I began to suspect it needed to be explicitly set. At first that failed because of the "Unconfigured" state.
:-< [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -Dedicated -Array0 -PhysDrv[32:11] -a0
Adapter: 0: Set Physical Drive at EnclId-32 SlotId-11 as Hot Spare Failed.
FW error description:
The specified device is in a state that doesn't support the requested command.
Exit Code: 0x32
The trick is to "clear" the drive. Obviously be very careful about which drive you choose to clear. Here I am clearing drive #11.
:-< [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDClear -Start -PhysDrv[32:11] -a0
Started clear progress on device(Encl-32 Slot-11)
:-> [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -PhysDrv[32:11] -a0
Adapter: 0: Set Physical Drive at EnclId-32 SlotId-11 as Hot Spare Failed.
FW error description:
The current operation cannot be performed because the physical drive clear is in progress.
Exit Code: 0x25
:-> [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDClear -ShowProg -PhysDrv[32:11] -a0
Clear Progress on Device at Enclosure 32, Slot 11 Completed 3% in 8 Minutes.
Next morning when the drive is finished clearing.
:-> [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDClear -ShowProg -PhysDrv[32:11] -a0
Device(Encl-32 Slot-11) is not in clear process
And now that it’s "clear" the hot spare can be set.
:-> [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -PhysDrv[32:11] -a0
Adapter: 0: Set Physical Drive at EnclId-32 SlotId-11 as Hot Spare Success.
Now when I check drives all drives are "Online, Spun Up" except number 11 which has this state.
Slot Number: 11
Firmware state: Hotspare, Spun Up
Other Stuff Which Maybe Should Be Used
The number N of the array parameter is the Span Reference you get using
/opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -aALL
and the number N of the row parameter is the Physical Disk in that span or array starting with zero (it’s not the physical disk’s slot!).
Rebuild drive - Drive status should be "Firmware state: Rebuild"
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -Start -PhysDrv '[32:7]' -aN
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -Stop -PhysDrv '[32:7]' -aN
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv '[32:7]' -aN
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ProgDsply -physdrv '[32:7]' -aN
Disks
The disks in example.edu appear to be these.
They are Dell branded Western Digital 2TB, 3.5", 7200 RPM, SATA II, 64MB Cache
WD2003FYYS
Specifically -
WD-WMAY04398245WDC WD2003FYYS-18W0B0 01.01D02
The drive bay labels are consistent with this.
Output
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | egrep 'Slot|Firmware state'
When the drives are good (#7 has just been replaced) it looks like this.
:-> [example.edu][~/RAID]$ ./check_drives
Slot Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Firmware state: Online, Spun Up
Slot Number: 2
Firmware state: Online, Spun Up
Slot Number: 3
Firmware state: Online, Spun Up
Slot Number: 4
Firmware state: Online, Spun Up
Slot Number: 5
Firmware state: Online, Spun Up
Slot Number: 6
Firmware state: Online, Spun Up
Slot Number: 7
Firmware state: Rebuild
Slot Number: 8
Firmware state: Online, Spun Up
Slot Number: 9
Firmware state: Online, Spun Up
Slot Number: 10
Firmware state: Online, Spun Up
Slot Number: 11
Firmware state: Unconfigured(good), Spun Up
A failure looks like this.
Slot Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Firmware state: Online, Spun Up
Slot Number: 2
Firmware state: Online, Spun Up
Slot Number: 3
Firmware state: Online, Spun Up
Slot Number: 4
Firmware state: Online, Spun Up
Slot Number: 5
Firmware state: Online, Spun Up
Slot Number: 6
Firmware state: Online, Spun Up
Slot Number: 7
Firmware state: Failed
Slot Number: 8
Firmware state: Online, Spun Up
Slot Number: 9
Firmware state: Online, Spun Up
Slot Number: 10
Firmware state: Online, Spun Up
Slot Number: 11
Firmware state: Unconfigured(good), Spun Up
The correct status looks like this.
Slot Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Firmware state: Online, Spun Up
Slot Number: 2
Firmware state: Online, Spun Up
Slot Number: 3
Firmware state: Online, Spun Up
Slot Number: 4
Firmware state: Online, Spun Up
Slot Number: 5
Firmware state: Online, Spun Up
Slot Number: 6
Firmware state: Online, Spun Up
Slot Number: 7
Firmware state: Online, Spun Up
Slot Number: 8
Firmware state: Online, Spun Up
Slot Number: 9
Firmware state: Online, Spun Up
Slot Number: 10
Firmware state: Online, Spun Up
Slot Number: 11
Firmware state: Hotspare, Spun Up
An even better check.
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL |\
grep -i '\(alert\)\|\(failure\)\|\(error\)\|\(firmware state\)\|\(attention\)\|\(slot\)' |\
sed '/^Slot/s/^/== /'
Here’s a full script to run from cron. Run with no args it’s silent unless there are problems. Run with any kind of args, it prints the full report. Run the no arg version every 15 min or so and the arg version every day to get a positive assertion that the check is working.
#!/bin/bash
RAIDCMD=/opt/MegaRAID/MegaCli/MegaCli64
function full_report {
date
${RAIDCMD} -PDList -aALL |\
grep -i '\(alert\)\|\(failure\)\|\(error\)\|\(firmware state\)\|\(attention\)\|\(slot\)' |\
sed '/^Slot/s/^/== /'
}
if [ -n "$1" ]; then
full_report # Full report always. Run once per day.
else
# No args - Silent unless there's a problem. Run every 15 min.
if ! ${RAIDCMD} -PDList -aALL | grep ^Firmware | uniq | awk '{if (NR > 1) {print NR; exit 1}}'; then
echo "THERE IS MORE THAN ONE DRIVE STATE. A DISK MAY HAVE DEGRADED."
full_report
fi
fi
Yes, this script is uncool because there should be a hot spare making for 2 different states when no drives have failed. But hey, I didn’t set that up. Just keep it in mind and adjust the script as necessary.