The uncertainty is not if your hard drives will fail. The question is when will they fail? The answer is, at the absolute most inconvenient moment.

Contents

Linux Software Raid

This describes how to use Linux to configure inexpensive disks in redundant arrays without any proprietary hardware. The huge advantage of this technique is that there is no RAID controller to fail leaving your data stranded in the striping pattern that only that controller knows about.

Warning
If your motherboard has a RAID "feature" I would not use it. If your motherboard dies or any component rendering it inoperable, your data could easily be trapped until you track down a completely identical motherboard on eBay. Linux software RAID does not have this problem.

Clearing Stupid Windows Cruft

So you’ve decommissioned an old Windows machine and you want to use it in your proper Linux system. You dd from /dev/zero all over it and are ready to format it. But then you find that there’s a weird and tenacious entry in lsblk or /dev/mapper/ belonging to it. How to get Linux to stop worrying about it?

sudo dmsetup remove /dev/mapper/pdc_bebdacbjhg

Problem solved.

Setup

Set up partitions to use the partition type "fd" which is "Linux raid autodetect".

          Boot Start  End     Blocks     Id  System
/dev/sda1   *  1      25      200781     fd  Linux raid autodetect
/dev/sda2      26     269     1959930    82  Linux swap / Solaris
/dev/sda3      270    5140    39126307+  fd  Linux raid autodetect
/dev/sda4      5141   10011   39126307+  fd  Linux raid autodetect

Here I am making a boot partition, a swap partition and 2 identically sized data partitions. These can be made into RAID volumes with each other or across other physical drives. I leave the swap as swap because I don’t think it’s entirely useful to have swap space RAID protected.

Once you have the partition table the way you want it on one physical drive, you can clone it to another by doing something like this:

sfdisk -d /dev/sda | sfdisk  /dev/sdb
Note
Double check that this trick works. I suspect that it may not with this exact syntax any more.

You need to have the Linux RAID enabled in the kernel. From an install disk you might need to:

modprobe md-mod # Or simply `modprobe md`
modprobe raid1

(Or whatever raid level you’re after). To get this module configure here:

Device_Drivers->Multi-device_support_(RAID_and_LVM)->RAID_support->RAID-1_(mirroring)_mode

You might also need this if it’s not already there:

# emerge -av sys-fs/mdadm

Now set up some md devices (md=multi disk, I think):

livecd ~ # mknod /dev/md1 b 9 1
livecd ~ # mknod /dev/md2 b 9 2
livecd ~ # mknod /dev/md3 b 9 3
livecd ~ # mknod /dev/md4 b 9 4

Time to actually setup the raid devices:

livecd ~ # mdadm --create --verbose /dev/md1 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
mdadm: size set to 200704K
mdadm: array /dev/md1 started.
livecd ~ # mdadm --create --verbose /dev/md3 --level=1 --raid-devices=2 /dev/sda3 /dev/sdb3
mdadm: size set to 39126208K
mdadm: array /dev/md3 started.
livecd ~ # mdadm --create --verbose /dev/md4 --level=1 --raid-devices=2 /dev/sda4 /dev/sdb4
mdadm: size set to 39126208K
mdadm: array /dev/md4 started.
Note
If you’re making a RAID1 set which has only one partition which needs to be bootable, you might think about including --metadata=1.0 so that the RAID’s meta data does not conflict with the partition’s. Apparently this can cause all kinds of booting problems and incompatibilities. It might just be safer to have a boot partition that is not RAID and mirror it manually. This is reasonable since it shouldn’t be too dynamic and will guarantee a booting machine regardless of the kernel’s ability to auto assemble RAID during initialization.

Here’s an example of a serious RAID setup:

mdadm --create --verbose /dev/md3 --level=5 --raid-devices=22 /dev/sdc \
/dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj \
/dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq \
/dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx

Here’s one with a hot spare:

mdadm --verbose --create /dev/md3 --level=5 --raid-devices=7
--spare-devices=1 /dev/sdd /dev/sde /dev/sdf /dev/sdc /dev/sdg
/dev/sdi /dev/sdj /dev/sdh

Note that if the RAID volumes were already set up, then just make the devices with mknod and then instead of "creating" the RAID components, just re"assemble" them like this:

mdadm --assemble --verbose /dev/md4 /dev/sda4 /dev/sdb4

Note that if the RAID volumes were already set up and there are redundant disks, it might happen that the RAID volume starts without its redundant mirror. For example, if you find:

md1 : active raid1 sda1[0]
      521984 blocks [2/1] [U_]

And you know that it should be being mirrored with /dev/sdb1, then do this:

# mdadm /dev/md1 --add /dev/sdb1

And then look for this:

md1 : active raid1 sdb1[1] sda1[0]
      521984 blocks [2/2] [UU]

This seems to finish right away, but actually takes a long time to get established properly. Presumably this can be done with a drive full of data and this process copies everything.

Checking

You can check up on a running array with this.

watch -n 1 cat /proc/mdstat

Note that a better way to get more information about what is going on with a RAID array is through mdadm:

mdadm --query --detail /dev/md3

Or

mdadm --misc --detail /dev/md0

For information about the components of a RAID set use examine.

mdadm --misc --examine /dev/sd[ab]

To stop a RAID array basically means to have the system stop worrying about it:

mdadm --misc --stop /dev/md3

This is useful if you start an array but something goes wrong and you need to reuse the disks that are part of the aborted or inactive or unwanted array.

Since this is Linux, you can get an absurd amount of information about the status of your RAID volume. Check out the virtual /sys/block/md1/... directory tree and explore very fine details of the setup. Read about what all that good stuff is in the kernel’s md.txt documentation.

# cat /sys/block/md1/md/array_state
clean

Once you’re comfortable with checking on the status of the RAID, a good thing to do is to actively test it for behavior under stress. This is a good guide to testing RAID setups.

mdadm.conf

For temporary purposes, you can setup a /etc/mdadm.conf file like this:

mdadm --detail --scan > /etc/mdadm.conf

but for a more permanent installation (once the OS is installed), this is a good format for this file:

Contents of /etc/mdadm.conf
DEVICE    /dev/sda*
DEVICE    /dev/sdb*
ARRAY           /dev/md1 devices=/dev/sda1,/dev/sdb1
#ARRAY           /dev/md2 devices=/dev/sda2,/dev/sdb2
ARRAY           /dev/md3 devices=/dev/sda3,/dev/sdb3
ARRAY           /dev/md4 devices=/dev/sda4,/dev/sdb4
MAILADDR   sysnet-admin@sysnet.ucsd.edu
Now for some filesystems
livecd ~ # mkfs.ext2 /dev/md1
livecd ~ # mkswap /dev/sda2
livecd ~ # mkswap /dev/sdb2
livecd ~ # swapon /dev/sd[ab]2
livecd ~ # mkfs.ext3 /dev/md3
livecd ~ # mkfs.ext3 /dev/md4
Getting the bootloader on both drives
emerge grub
grub --no-floppy
Setup MBR on /dev/sda:
root (hd0,0)
setup (hd0)
Setup MBR on /dev/sdb:
device (hd0) /dev/sdb
root (hd0,0)
setup (hd0)
quit
Note
Gentoo people should see if grub needs to be compiled with the device-mapper USE flag. Gentoo people should also emerge -avuD mdadm at some point.

Don’t forget that your kernel will want a root=/dev/md3 parameter.

Recovery

So imagine that you have some error like this:

[hb.xed.ch][~]# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdd2[1] sdc2[0]
      1951808 blocks [2/2] [UU]

md2 : active raid1 sdd3[1]
      78268608 blocks [2/1] [_U]

md0 : active raid1 sdd1[1] sdc1[0]
      192640 blocks [2/2] [UU]

The second partition on the first disk is dead and not being used. Install a new disk which will then show unused partitions for the blank drive.

The partition table has to be recreated. Make sure you get this right!!

Display the partition tables of both possible partitions
:-> [swamp][~]$ fdisk -l /dev/sdc

Disk /dev/sdc: 82.3 GB, 82348277760 bytes
255 heads, 63 sectors/track, 10011 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1   *           1          25      200781   fd  Linux raid autodetect
/dev/sdc2              26         269     1959930   82  Linux swap / Solaris
/dev/sdc3             270        5140    39126307+  fd  Linux raid autodetect
/dev/sdc4            5141       10011    39126307+  fd  Linux raid autodetect
:-> [swamp][~]$ fdisk -l /dev/sdd

Disk /dev/sdd: 82.3 GB, 82348277760 bytes
255 heads, 63 sectors/track, 10011 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Disk /dev/sdd doesn't contain a valid partition table

From this you can see that hdd is the blank drive - no doubt about it. So don’t copy sdd’s table to sdc; that would be bad. Note also that the drives are identical size. This is good. In fact, it might be smart to partition to begin with with a few percent not used to account for the difference between various brands in a nominal size.

Rebuild the partition table using the same technique used at installation:

sfdisk -d /dev/sdc | sfdisk /dev/sdd

Double check:

fdisk -l /dev/sdc; fdisk -l /dev/sdd

Set up swap space on the other volume and go ahead and use it:

:-> [swamp][~]$ mkswap /dev/sdd2
Setting up swapspace version 1, size = 2006962 kB
no label, UUID=81258423-4245-431b-8fb9-137e9651e7dd
:-> [swamp][~]$ swapon /dev/sdd2

Resetting Old RAID Partitions

Sometimes you might run one drive for a while and then introduce it’s matching RAID mirror and the mirror will spontaneously associate with an md device. Here’s how to clear that before adding it back to the real md set.

If I add hdc1 and it shows an md0 that shouldn’t exist, like this:

md0 : active raid1 hdc1[1]
      104320 blocks [2/1] [_U]

Mark it as failed:

# mdadm --manage --fail /dev/md0

Or if it’s running and you don’t want it to be:

# mdadm --manage --stop /dev/md0

And then:

# mdadm --manage --remove /dev/md0

This actually didn’t seem to work! I just fixed the mdadm.conf and rebooted. Maybe I needed to:

# mdadm --zero-superblock /dev/sda
# mdadm --zero-superblock /dev/sdb

Sometimes none of this works! You have to use dd. You can overwrite the whole thing but try this first.

DEVICE=/dev/sdn
SECTORS=$(blockdev --getsz $DEVICE)
dd if=/dev/zero of=${DEVICE} bs=512 seek=$(( ${SECTORS} - 1024 )) count=1024

Do the recovery mirroring

Do the small easy one first:

:-> [swamp][~]$ mdadm --manage /dev/md1 --add /dev/sdd1
mdadm: added /dev/sdd1

Now check to see that it’s working:

:-> [swamp][~]$ cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdd1[2] sdc1[0]
      200704 blocks [2/1] [U_]
      [===================>.]  recovery = 98.4% (198912/200704) finish=0.0min speed=39782K/sec

Check again to see it complete:

:-> [swamp][~]$ cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 sdd1[1] sdc1[0]
      200704 blocks [2/2] [UU]

Do the rest:

:-> [swamp][~]$ mdadm --manage /dev/md3 --add /dev/sdd3
:-> [swamp][~]$ mdadm --manage /dev/md4 --add /dev/sdd4

These things go sequentially, so don’t panic if one waits for the other to finish. If you issue these commands, they’ll work eventually.

You can keep an eye on it:

watch cat /proc/mdstat

That might look like:

raven ~ # cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 hdc1[1] hda1[0]
      104320 blocks [2/2] [UU]

md3 : active raid1 hdc3[2] hda3[0]
      9775488 blocks [2/1] [U_]
      [===>.................]  recovery = 18.0% (1767424/9775488) finish=4.8min speed=27328K/sec

md4 : active raid1 hdc4[2] hda4[0]
      145910272 blocks [2/1] [U_]
        resync=DELAYED

unused devices: <none>

Go ahead and put the bootloader in the MBR on the new drive using the procedure outlined above.

Note
If you’re thinking about replacing anything while powered up, check out this information about what hardware is ok with hot swapping.

Mounting a partition that used to be in a RAID set

Maybe you’ve decommissioned a pair of mirrored RAID drives and you’re thinking of reusing them but you want to see what was on them to make sure it’s nothing important. Here is how to simply mount this drive on a different Linux box. You can check to see that the drive is plugged in and recognized somewhere with:

$ cat /proc/partitions
major minor  #blocks  name
 3     0  156290904 hda
 3     1     104391 hda1
 3     2     498015 hda2
22     0  156290904 sdb
22     1     104391 sdb1
22     2     498015 sdb2

There it is, sdb. But if you try:

$ mount /dev/sdb/1 /mnt/b/1

You get:

mount: unknown filesystem type 'mdraid'

It doesn’t work like that. You need to make sure all your kernel modules are happy for RAID (or have them compiled in works too):

# modprobe md
# modprobe raid1

The next thing to have ready is mdadm. This can be installed with apt install mdadm but I found that it installed a lot of serious cruft that I didn’t want (email stuff, MySQL, etc). I found that to just get a quick easy mdadm this was effective.

apt install libudev-dev
git clone git://git.kernel.org/pub/scm/utils/mdadm/mdadm.git
cd mdadm
make
./mdadm --version

If that’s good and mdadm is installed then you can do:

# sudo mdadm --assemble --run /dev/md1 /dev/sdb1
# sudo mount /dev/md1 /mnt/raid

And you’re looking at the contents.

md127 WTF?

Recently I’ve been having a lot of trouble with RAID volumes showing up with names like /dev/md127. This seems especially problematic with making RAID volumes which are for a Gentoo system made with sysresccd. I make the volumes something sensible like dev/md1 and 1. it doesn’t boot into the system and 2. when analyzed in a subsequent boot of sysresccd, they show up with the strange device names.

This could have something to do with:

  • …misunderstanding the kernel’s whole approach to RAID. Here is the kernel’s official md documentation. It may be unnervingly out of date in some subtle but important places.

  • …metadata problems. Used to be 0.90 but now it’s 1.2 by default. Use --metadata=0.90 to specify explicitly during creation. Other options are 1.0 and 1.1. GRUB Legacy is apparently only ok with 0.90, while GRUB2 has a mdradi1x "module" which accommodates 1.x metadata versions. Note that metadata is stored in the per-device superblocks. What exactly is in the superblocks? Here is a reference. Use cat /sys/block/md1/md/metadata_version to check.

  • …sysresccd and it being older or newer than some other component. From the mdadm man page: "When creating a partition based array, using mdadm with version-1.x metadata, the partition type should be set to 0xDA (non fs-data). This type selection allows for greater precision since using any other [RAID auto-detect (0xFD) or a GNU/Linux partition (0x83)], might create problems in the event of array recovery through a live cdrom." Note that this might not be an issue if there are no partitions at all and the device (/dev/sda) is used.

  • …the --super-minor= option. This only applies to 0.90 metadata RAID sets.

  • …the --uuid= option.

  • …not properly specifying a --name and --homehost during RAID creation. Seems like old RAID used to make some assumptions about things and new (metadata) RAID encodes more explicit identification.

    mdadm --verbose --create --auto=yes /dev/md/os --name=os \
    --metadata=default --raid-devices=2 --level=1 /dev/sda /dev/sdb
    mdadm --verbose --create /dev/md0 --level=1 --raid-devices=2 \
    --metadata=0.90 /dev/sda1 /dev/sdb1

The exact problem seems to be described here as:

For version 1.2 superblocks, the preferred way to create arrays is
by using a name instead of a number.  For example, if the array is
your home partition, then creating the array with the option
--name=home will cause the array to be assembled with a random
device number (which is what you are seeing now, when an array
doesn't have an assigned number we start at 127 and count
backwards), but there will be a symlink in /dev/md/ that points to
whatever number was used to assemble the array.  The symlink in
/dev/md/ will be whatever is in the name field of the superblock.
So in this example, you would have /dev/md/home that would point
to /dev/md127 and the preferred method of use would be to access
the device via the /dev/md/home entry.

Fun fact: I had given up on my previous setup and was going to redo the entire RAID. One of the changes I planned to make was to not use partitions and try to just make a RAID volume from a couple of complete drives (/dev/sd[ab]). What shocked me was that when I created the array as such, it started mirroring the data that was on one of the drives which was left over from a set that I’d previously installed an OS on using /dev/sda1 and /dev/sdb1. The fact that this was possible and automatic is either extremely wonderful or unnerving.

Sysresccd Problems

Sadly, I believe that sysresccd has been responsible for some of the problems I’ve been experiencing. It seems that it aggressively tries to automatically assemble RAID volumes and that’d be fine except that it seems to corrupt them in certain cases.

Knoppix

modprobe md
modprobe raid1
fdisk /dev/sda <= Linux raid autodetect
fdisk /dev/sdb <= Linux raid autodetect
mdadm --verbose --create --level=1 --raid-devices=2 \
      --metadata=0.90 /dev/md0 /dev/sda1 /dev/sdb1
watch cat /proc/mdstat # and wait until finished.
mkfs.ext3 -v -L RAID1VOL /dev/md0
mount /dev/md0 /mnt
ls -al /mnt
date > /mnt/integrity_check
cd ~ ; umount /dev/md0
mdadm --stop /dev/md0
shutdown -r now
  *REBOOT*
modprobe md
modprobe raid1
mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 # /dev/md0 auto created
mkdir /mnt/raid /mnt/backup
mount /dev/md0 /mnt/raid
cat /mnt/raid/integrity_check; rm /mnt/raid/integrity_check
mount /dev/sdd3 /mnt/backup  # Find all your data again.
time rsync -a /mnt/backup/ /mnt/raid/

Alternatives To SysrescCD

SysRescCD's problems are discussed here and here and here.

Using Debian Live

F8
Memorex
Live <TAB>
memtest^H^H^H^H^H^H^H
sudo su -
cd /dev; mdadm -vA md0 sd[ab]1
mount /dev/md0 /mnt
mount -t proc none /mnt/proc
mount --rbind /dev /mnt/dev
chroot /mnt /bin/bash

OMG IT WORKED!

I used a debian-live-7.2-amd64-rescue.iso to set up a RAID1 with 0.90 metadata on 2 devices with partitions (i.e. /dev/sda1 and /dev/sdb1) of type fd. This was readable by Legacy Grub and the kernel was able to assemble it at boot without an initramfs.

Can Sysresc be used with RAID at all?

Sysresc is definitely aggressive about building RAID sets if it thinks there are any devices that could be assembled. In some cases this would be good but when you’re trying to erase one or set one up or both, it’s annoying.

It looks like RAID sets created with sysresc create a "homehost" called sysresccd and then on boot (to sysresc) it thinks that it is on it’s home host and tries autoassembly. This is with normal (now, post 2010) metadata version 1.2.

The first thing to do is to try to use the --stop function of mdadm as described. Also the --remove. What I found is that although /proc/mdstat was empty, when I configured a drive with fdisk to have the fd type, suddenly a RAID volume had been assembled (badly). I found that doing udevadm hwdb --update can help.

This Worked
mdadm --verbose --manage --stop /dev/md127
cat /proc/mdstat # make sure it is gone
mdadm --verbose --create /dev/md0 --homehost=anyhomehost --name=ssdset \
      --level=1 --raid-devices=2 /dev/sda /dev/sdb
watch cat /proc/mdstat # Waiting until finished is probably smart.
reboot

mdadm --verbose --manage --stop /dev/md127
mdadm --verbose --assemble /dev/md0 /dev/sda /dev/sdb
mdadm --misc --examine /dev/sd[ab]
mdadm --misc --detail /dev/md0

Should be good to mkfs and mount. On reboot, stop the bogus /dev/md127 or whatever it is and assemble correctly and everything should be there.

USE="static" ebuild /usr/portage/sys-fs/mdadm/sys-fs/mdadm-3.2.6-r1.ebuild compile

Then look for a static executable in

/var/tmp/portage/sys-fs/mdadm-3.2.6-r1/work/mdadm-3.2.6

Using Hardware RAID On Linux with 3Ware Cards

Notes about hardware Raid using a particular 3Ware card (that I don’t use any more but maybe this information will still be useful).

Replacing a drive after a failure

1

First find the problem. Use a command much like this:

 :-> [hb][~]$ /sbin/tw_cli /c0 show

 Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
 ------------------------------------------------------------------------------
 u0    RAID-5    OK             -       -       64K     4656.51   ON     OFF
 u1    SPARE     OK             -       -       -       465.753   -      OFF

 Port   Status           Unit   Size        Blocks        Serial
 ---------------------------------------------------------------
 p0     OK               u0     465.76 GB   976773168     WD-WCANU1306978
 p1     OK               u0     465.76 GB   976773168     WD-WCANU1241597
 p2     OK               u0     465.76 GB   976773168     WD-WCANU1230955
 p3     OK               u0     465.76 GB   976773168     WD-WCANU1222179
 p4     OK               u0     465.76 GB   976773168     WD-WCANU1318737
 p5     OK               u0     465.76 GB   976773168     WD-WCANU1230683
 p6     OK               u0     465.76 GB   976773168     WD-WCANU1240889
 p7     OK               u0     465.76 GB   976773168     WD-WCANU1234675
 p8     OK               u0     465.76 GB   976773168     9QG0DRJH
 p9     SMART-FAILURE    u1     465.76 GB   976773168     WD-WCANU1231205
 p10    OK               u0     465.76 GB   976773168     WD-WCANU1255530
 p11    OK               u0     465.76 GB   976773168     WD-WCANU1059605

This shows that drive 9 is showing signs of flakiness and though it may work great, it may not and should be replaced. It is likely that if there is a problem drive, it will be isolated on it’s own RAID unit as the controller will grab the good drive from the hot spare and start using it.

2

Now extract the bad drive. This can be very confusing. I have proven twice this month that the labels that are on hb are good. There are no labels on puzzlebox. It’s very important to pull the right drive, especially when things are bad, and this can be confusing, so be careful.

 :-> [hb][~]$ /sbin/tw_cli /c0 show

 Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
 ------------------------------------------------------------------------------
 u0    RAID-5    OK             -       -       64K     4656.51   ON     OFF

 Port   Status           Unit   Size        Blocks        Serial
 ---------------------------------------------------------------
 p0     OK               u0     465.76 GB   976773168     WD-WCANU1306978
 p1     OK               u0     465.76 GB   976773168     WD-WCANU1241597
 p2     OK               u0     465.76 GB   976773168     WD-WCANU1230955
 p3     OK               u0     465.76 GB   976773168     WD-WCANU1222179
 p4     OK               u0     465.76 GB   976773168     WD-WCANU1318737
 p5     OK               u0     465.76 GB   976773168     WD-WCANU1230683
 p6     OK               u0     465.76 GB   976773168     WD-WCANU1240889
 p7     OK               u0     465.76 GB   976773168     WD-WCANU1234675
 p8     OK               u0     465.76 GB   976773168     9QG0DRJH
 p9     DRIVE-REMOVED    -      -           -             -
 p10    OK               u0     465.76 GB   976773168     WD-WCANU1255530
 p11    OK               u0     465.76 GB   976773168     WD-WCANU1059605

This is as expected and now the hot spare unit (u1) goes away.

3

Put a new drive in. Much easier than step #2 - don’t forget to bring a Phillips screwdriver to the machine room, by the way.

 :-> [hb][~]$ /sbin/tw_cli /c0 show

 Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
 ------------------------------------------------------------------------------
 u0    RAID-5    OK             -       -       64K     4656.51   ON     OFF

 Port   Status           Unit   Size        Blocks        Serial
 ---------------------------------------------------------------
 p0     OK               u0     465.76 GB   976773168     WD-WCANU1306978
 p1     OK               u0     465.76 GB   976773168     WD-WCANU1241597
 p2     OK               u0     465.76 GB   976773168     WD-WCANU1230955
 p3     OK               u0     465.76 GB   976773168     WD-WCANU1222179
 p4     OK               u0     465.76 GB   976773168     WD-WCANU1318737
 p5     OK               u0     465.76 GB   976773168     WD-WCANU1230683
 p6     OK               u0     465.76 GB   976773168     WD-WCANU1240889
 p7     OK               u0     465.76 GB   976773168     WD-WCANU1234675
 p8     OK               u0     465.76 GB   976773168     9QG0DRJH
 p9     OK               -      465.76 GB   976773168     WD-WCAS81281763
 p10    OK               u0     465.76 GB   976773168     WD-WCANU1255530
 p11    OK               u0     465.76 GB   976773168     WD-WCANU1059605

So now drive 9 is back in and it seem ok, but it’s not part of any unit. It’s basically doing nothing useful.

4

Assign the new drive to a hot spare group. This is actually the important and non-obvious part of the operation.

 :-> [hb][~]$ /sbin/tw_cli /c0 add type=spare disk=9
 Creating new unit on controller /c0 ...  Done. The new unit is /c0/u1.

 :-> [hb][~]$ /sbin/tw_cli /c0 show

 Unit  UnitType  Status         %RCmpl  %V/I/M  Stripe  Size(GB)  Cache  AVrfy
 ------------------------------------------------------------------------------
 u0    RAID-5    OK             -       -       64K     4656.51   ON     OFF
 u1    SPARE     OK             -       -       -       465.753   -      OFF

 Port   Status           Unit   Size        Blocks        Serial
 ---------------------------------------------------------------
 p0     OK               u0     465.76 GB   976773168     WD-WCANU1306978
 p1     OK               u0     465.76 GB   976773168     WD-WCANU1241597
 p2     OK               u0     465.76 GB   976773168     WD-WCANU1230955
 p3     OK               u0     465.76 GB   976773168     WD-WCANU1222179
 p4     OK               u0     465.76 GB   976773168     WD-WCANU1318737
 p5     OK               u0     465.76 GB   976773168     WD-WCANU1230683
 p6     OK               u0     465.76 GB   976773168     WD-WCANU1240889
 p7     OK               u0     465.76 GB   976773168     WD-WCANU1234675
 p8     OK               u0     465.76 GB   976773168     9QG0DRJH
 p9     OK               u1     465.76 GB   976773168     WD-WCAS81281763
 p10    OK               u0     465.76 GB   976773168     WD-WCANU1255530
 p11    OK               u0     465.76 GB   976773168     WD-WCANU1059605

Now the controller knows that this is a hot spare. Everything looks good again.

5

Immediately go buy a replacement! For hb, there’s one in my office that should not be used for anything but hb and there’s one in the machine room locker which the full sys admin team can get at in my absence. They should both be there!

6

Bonus step: send the dead drive in to the manufacturer if it’s still under warranty. Can’t have too many of these things lying around.

Using Hardware RAID On Linux with Areca Cards

cli64

I usually put this utility at /root/cli64. To get help do this.

/root/cli64 help
/root/cli64 vsf -h

Show status of current raid environment. This is good for diagnostic checks.

disk info

Note that drive info also may work, but I’ve found that it sometimes does not so disk info should be considered canonical.

If there is a failure you probably need to pull that failed disk out. It is important to get the correct number. The proper number for finding the physical disk is in the column labeled "Slot#".

There is a password protection. I generally have no need for it and will set it to "0000". This command must be issued in the session that requires password clearance. Just do this before proceeding.

set password=0000

However, I did confirm that the password does not need to be set to do a volume check. To check raid consistency in a volume (and possibly to trigger any deep I/O errors) try this.

vsf check vol=3
vsf info
vsf stopcheck

Note that the number seems to be the # when you do vsf info. So "1" for "vs0", and "3" for "vs2". Using vsf info will show you the progress of a running check.

CLI> vsf info
  # Name             Raid Name       Level   Capacity Ch/Id/Lun  State
===============================================================================
  1 vs0              rs0             Raid5   12000.0GB 00/00/00   Normal
  2 vs1              rs1             Raid5   12000.0GB 00/00/01   Normal
  3 vs2              rs2             Raid5   12000.0GB 00/00/02   Checking(0.2%)
===============================================================================
GuiErrMsg<0x00>: Success.

To set auto activation of an incomplete RAID. 1 is on, 0 is off. Unfortunately, this doesn’t always work.

sys autoact p=1

To see what is going on with a "raid set":

rsf info raid=3

If you pull the bad drive and replace it, you might get a "Free" status. This doesn’t help anything. Either your previous Hot Spares are hard at work now becoming primary working drives or you will be activating them to do so. Either way, you want that "Free" drive to be a Hot Spare (assuming you weren’t dumb enough to set up a system without a hot spare). To do this you need the following command with the number of the drive. The number is important to get right and confusing. The disk info command has a column labeled "CLI> #", the first column. Use this number to specify which drive to turn into a hot spare. This is different (probably) from which bay to pull a physical drive out of for a failure. For example, this is to turn the drive in physical by 17 into a hot spare.

rsf createhs drv=25

Sometimes there is a failure and the raid set just sits there. I think this tends to happen when the drive fails during a reboot (which is a time slightly more prone to failure). Here is a sequence where I check the raid set which contained a "Failed" that was replaced and turned to "Free". That is not ok since you actually want the drive’s status to be "rs2" or whatever raid set is correct and the rsf info’s "Raid Set State" to be "Rebuilding".

CLI> rsf info raid=3
Raid Set Information
===========================================
Raid Set Name        : rs2
Member Disks         : 7
Total Raw Capacity   : 14000.0GB
Free Raw Capacity    : 14000.0GB
Min Member Disk Size : 2000.0GB
Raid Set State       : Incompleted
===========================================
GuiErrMsg<0x00>: Success.

CLI> rsf activate raid=3
GuiErrMsg<0x00>: Success.

CLI> rsf info raid=3
Raid Set Information
===========================================
Raid Set Name        : rs2
Member Disks         : 7
Total Raw Capacity   : 14000.0GB
Free Raw Capacity    : 0.0GB
Min Member Disk Size : 2000.0GB
Raid Set State       : Rebuilding
===========================================
GuiErrMsg<0x00>: Success.

How’s that rebuild going? Check with:

`cli64 vsf info`

If there are problems, check the log:

`cli64 event info`

Areca support likes to see what you’re running:

`cli64 sys info`
`cli64 sys showcfg`
Note
that if a drive fails during a reboot, the raid card doesn’t know what to make of the situation. So instead of automatically rebuilding from a hot spare, it will just do nothing. You have to activate it as shown above. This means that whenever the file server is booted, a drive status report should be generated to make sure everything is starting out properly.
Note
on fs11 (data) when the array contained a failed drive all red lights were on. Removing the bad drive turned all the LEDs off. Also I couldn’t get drive identify drv=? to work. However on this particular machine some forward thinking person (me) had labeled all the bays the correct "Slot#". Finally, when I put the new drive in with 7 drive already functioning as "Normal" RAID set, the new drive seems to automatically have been configured as a "HotSpare".

Using Hardware RAID On Linux with LSI Cards (Dell)

A typical Dell server has the following RAID controller.

RAID bus controller:
LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

Downloads

Download manual and MegaCLI program at:

Put "MegaCLI" in the "keyword" field of the search (nothing else).

Look for the one at the top called MegaCLI 5.5 T2. I think the rest are not useful or necessary. This creates a file called 8-07-14_MegaCLI.zip which contains binaries for many platforms. The Linux one is really an RPM (good job with the double compression).

The RPM installs the following files.

/opt/MegaRAID/MegaCli/MegaCli
/opt/MegaRAID/MegaCli/MegaCli64
/opt/MegaRAID/MegaCli/libstorelibir-2.so.14.07-0

If trying to use this on a (64bit!) SysrescCD (or some non Red Hat system) use

rpm -ivh --nodeps MegaCli-8.07.14-1.noarch.rpm

This will put the correct files in place in /opt.

Or to just extract the files you need directly with no RPM installation do something like this.

rpm2cpio MegaCli-8.07.14-1.noarch.rpm | cpio --extract --list "
rpm2cpio MegaCli-8.07.14-1.noarch.rpm | cpio --extract --make-directories --verbose "

Usage

This general summary from Cisco shows a lot of good usage tips for MegaCLI including the replacing of a drive.

Enclosure Info

sudo /opt/MegaRAID/MegaCli/MegaCli64 -EncInfo -aALL

Note the field here "Device ID" which in this case is "32". This number can be used in the "E" place when drives are specified with the "[E:S]" syntax.

Battery Backup Unit Check

sudo /opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -aALL

Configuration Settings

sudo /opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -aALL

Adapter Info (Entire Device)

sudo /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL

This specifically looks interesting.

sudo /opt/MegaRAID/MegaCli/MegaCli64 -AdpAllInfo -aALL  | grep 'Errors *:'

Though I don’t know exactly what it means.

Logical Disk Checks

sudo /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -aALL

Physical Device Checks

sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDInfo -PhysDrv '[32:7]' -aALL

In the last one, the 32 is the "enclosure" number and the 7 is the slot. This basically shows all info about slot 7.

Replacing

Set state to offline. Note that this may just occur naturally after replacing a drive. If the drive is offline, see the next command.

/opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -PhysDrv '[32:7]' -aN

I did this and the hot spare’s firmware state went to "Rebuild"

Mark as missing is probably only useful if the drive really is missing. Skip this if you’re just planning a drive swap

/opt/MegaRAID/MegaCli/MegaCli64 -PDMarkMissing -PhysDrv[32:7] -a0

Prepare for removal did seem to work and I believe it spins down the drive so you’re not yanking out a spinning drive. I believe that it also turns off the bottom LED on the enclosure.

/opt/MegaRAID/MegaCli/MegaCli64 -PdPrpRmv -PhysDrv[32:7] -a0

Blink the LED of the drive. It might turn out that the light is already blinking so this gives no peace of mind. Try blinking the neighbors to home in on exactly the bay you think you’re working with.

/opt/MegaRAID/MegaCli/MegaCli64 -PDLocate -PhysDrv '[32:1]' -aN
/opt/MegaRAID/MegaCli/MegaCli64 -PDLocate stop -PhysDrv '[32:1]' -aN
Adapter: 0: Device at EnclId-32 SlotId-1  --
PD Locate Start Command was successfully sent to Firmware

Replace missing drive only if there’s no hot spare taking over i.e. the array is degraded and needs to be fixed with this new drive.

/opt/MegaRAID/MegaCli/MegaCli64 -PdReplaceMissing -PhysDrv[32:7] -ArrayN -rowN -aN

If there is a hot spare that has just switched over when you took the bad actor offline (or it happened automatically), then this new drive will need to become the hot spare. That' is described below in the Hot Spare section.

I don’t know what the N variables here are. I didn’t do any of this. I just yanked the bad drive out and started from here.

Here’s what I did do. After physically replacing the drive its PDInfo can be found like this (this example is for enclosure 32, drive 7, adaptor 0).

sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDInfo -PhysDrv[32:7] -a0

Look for "Firmware state:" which might be rebuilding. Eventually it stops at "offline". This makes the logical disk information show the set as "degraded" still. Check for that with this.

sudo /opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -L1 -a0

If the drive says "offline" and you want it to be participating in the RAID do something like this.

sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDOnline -PhysDrv[32:7] -a0

Note that this is the syntax that worked for me (no quoting of the brackets even). This produced this message.

EnclId-32 SlotId-7 state changed to OnLine.

Once you have all the drives "online" check to see if the State is still "degraded". The State should be "Optimal".

See page 254 of the manual for more information.

Hot Spare

When I first got to the RAID array on example.edu, drive number 11 had the following condition.

Slot Number: 11
Firmware state: Unconfigured(good), Spun Up

At first I didn’t understand what this meant and figured it might be possible that it could be automatically used as a hot spare in this condition. However, I began to suspect it needed to be explicitly set. At first that failed because of the "Unconfigured" state.

:-< [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -Dedicated -Array0 -PhysDrv[32:11] -a0

Adapter: 0: Set Physical Drive at EnclId-32 SlotId-11 as Hot Spare Failed.

FW error description:
  The specified device is in a state that doesn't support the requested command.

  Exit Code: 0x32

The trick is to "clear" the drive. Obviously be very careful about which drive you choose to clear. Here I am clearing drive #11.

:-< [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDClear -Start -PhysDrv[32:11] -a0

Started clear progress on device(Encl-32 Slot-11)
:-> [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -PhysDrv[32:11] -a0

Adapter: 0: Set Physical Drive at EnclId-32 SlotId-11 as Hot Spare Failed.

FW error description:
  The current operation cannot be performed because the physical drive clear is in progress.

Exit Code: 0x25
:-> [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDClear -ShowProg -PhysDrv[32:11] -a0

Clear Progress on Device at Enclosure 32, Slot 11 Completed 3% in 8 Minutes.

Next morning when the drive is finished clearing.

:-> [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDClear -ShowProg -PhysDrv[32:11] -a0

Device(Encl-32 Slot-11) is not in clear process

And now that it’s "clear" the hot spare can be set.

:-> [example.edu][~/RAID]$ sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDHSP -Set -PhysDrv[32:11] -a0

Adapter: 0: Set Physical Drive at EnclId-32 SlotId-11 as Hot Spare Success.

Now when I check drives all drives are "Online, Spun Up" except number 11 which has this state.

Slot Number: 11
Firmware state: Hotspare, Spun Up

Other Stuff Which Maybe Should Be Used

The number N of the array parameter is the Span Reference you get using

/opt/MegaRAID/MegaCli/MegaCli64 -CfgDsply -aALL

and the number N of the row parameter is the Physical Disk in that span or array starting with zero (it’s not the physical disk’s slot!).

Rebuild drive - Drive status should be "Firmware state: Rebuild"
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -Start -PhysDrv '[32:7]' -aN
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -Stop -PhysDrv '[32:7]' -aN
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -PhysDrv '[32:7]' -aN
/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ProgDsply -physdrv '[32:7]' -aN

Disks

The disks in example.edu appear to be these.

They are Dell branded Western Digital 2TB, 3.5", 7200 RPM, SATA II, 64MB Cache

WD2003FYYS

Specifically -

WD-WMAY04398245WDC WD2003FYYS-18W0B0                   01.01D02

The drive bay labels are consistent with this.

Output

check_drives
sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL | egrep 'Slot|Firmware state'

When the drives are good (#7 has just been replaced) it looks like this.

:-> [example.edu][~/RAID]$ ./check_drives
Slot Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Firmware state: Online, Spun Up
Slot Number: 2
Firmware state: Online, Spun Up
Slot Number: 3
Firmware state: Online, Spun Up
Slot Number: 4
Firmware state: Online, Spun Up
Slot Number: 5
Firmware state: Online, Spun Up
Slot Number: 6
Firmware state: Online, Spun Up
Slot Number: 7
Firmware state: Rebuild
Slot Number: 8
Firmware state: Online, Spun Up
Slot Number: 9
Firmware state: Online, Spun Up
Slot Number: 10
Firmware state: Online, Spun Up
Slot Number: 11
Firmware state: Unconfigured(good), Spun Up

A failure looks like this.

Slot Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Firmware state: Online, Spun Up
Slot Number: 2
Firmware state: Online, Spun Up
Slot Number: 3
Firmware state: Online, Spun Up
Slot Number: 4
Firmware state: Online, Spun Up
Slot Number: 5
Firmware state: Online, Spun Up
Slot Number: 6
Firmware state: Online, Spun Up
Slot Number: 7
Firmware state: Failed
Slot Number: 8
Firmware state: Online, Spun Up
Slot Number: 9
Firmware state: Online, Spun Up
Slot Number: 10
Firmware state: Online, Spun Up
Slot Number: 11
Firmware state: Unconfigured(good), Spun Up

The correct status looks like this.

Slot Number: 0
Firmware state: Online, Spun Up
Slot Number: 1
Firmware state: Online, Spun Up
Slot Number: 2
Firmware state: Online, Spun Up
Slot Number: 3
Firmware state: Online, Spun Up
Slot Number: 4
Firmware state: Online, Spun Up
Slot Number: 5
Firmware state: Online, Spun Up
Slot Number: 6
Firmware state: Online, Spun Up
Slot Number: 7
Firmware state: Online, Spun Up
Slot Number: 8
Firmware state: Online, Spun Up
Slot Number: 9
Firmware state: Online, Spun Up
Slot Number: 10
Firmware state: Online, Spun Up
Slot Number: 11
Firmware state: Hotspare, Spun Up

An even better check.

sudo /opt/MegaRAID/MegaCli/MegaCli64 -PDList -aALL |\
grep -i '\(alert\)\|\(failure\)\|\(error\)\|\(firmware state\)\|\(attention\)\|\(slot\)' |\
 sed '/^Slot/s/^/== /'

Here’s a full script to run from cron. Run with no args it’s silent unless there are problems. Run with any kind of args, it prints the full report. Run the no arg version every 15 min or so and the arg version every day to get a positive assertion that the check is working.

#!/bin/bash

RAIDCMD=/opt/MegaRAID/MegaCli/MegaCli64

function full_report {
        date
        ${RAIDCMD} -PDList -aALL |\
        grep -i '\(alert\)\|\(failure\)\|\(error\)\|\(firmware state\)\|\(attention\)\|\(slot\)' |\
         sed '/^Slot/s/^/== /'
}

if [ -n "$1" ]; then
    full_report # Full report always. Run once per day.
else
    # No args - Silent unless there's a problem. Run every 15 min.
    if ! ${RAIDCMD} -PDList -aALL | grep ^Firmware | uniq | awk '{if (NR > 1) {print NR; exit 1}}'; then
        echo "THERE IS MORE THAN ONE DRIVE STATE. A DISK MAY HAVE DEGRADED."
        full_report
    fi
fi

Yes, this script is uncool because there should be a hot spare making for 2 different states when no drives have failed. But hey, I didn’t set that up. Just keep it in mind and adjust the script as necessary.