RAIDframe

Intro
TODO
Status
Download
How-to
Performance
Credits

Intro
This project brings RAIDframe to FreeBSD, with some extras. It is based on the NetBSD RAIDframe port by Greg Oster which in turn is based on the RAID research and prototyping tool developed by the Parallel Data Laboratory at Carnegie Mellon University. This port, and the NetBSD port, are based on RAIDframe version 1.1 and provide the following features:
  • Threaded RAID engine contained within the kernel.
  • Fast, solid, and tested performance that is well-suited for a production environment.
  • Well documented and flexible framework that is ideal for RAID research and prototyping.
  • Large number of RAID features, like RAID 0/1/4/5/6, hot spares, parity logging, etc. Through the configuration utility, spares can be hot added, components can be failed, reconstruction started/monitored, and parity regenerated.
  • Autoconfiguration of arrays at boot time, and root mounting of these arrays.
  • Independence from the lower level device. Arrays can be constructed out of any block device, whether it's IDE, SCSI, or another array.

The code is relatively unchanged from NetBSD with the exception of the kernel interface, which has undergone some major work. Raid devices are now dynamically created through a raid control device, eliminating the compile-time constraints present in NetBSD.

TODO
  • Stability. No testing has been done under SMP in the 5-CURRENT branch. The original 4-STABLE work was developed exclusively on an SMP machine, but the differences between 4.x and 5.x are huge. Error recovery also has not been extensively tested.
  • Push DAG selection to the kthread, or pre-allocate DAG selection resources. In the current code, the strategy() path leads to code that selects the correct DAG for the operation. Unfortunately, it does several mallocs during this process. If the mallocs cannot be avoided, then this code needs to be pushed to the kthread.
  • Rework the locking semantics. A very pervasive problem in RAIDframe is that mutexes are used to serialize code paths, not protect data structures. The code needs a good audit to change mutexes to SX locks where appropriate, or rework the locking in general.
  • Unify and clean up the debug printing. It's a mess. Several entirely different mechanisms exist. NetBSD worked on this a while back, so maybe that work can be leveraged.
  • Make the modular parts of the system modular. The Queuing policy and Raid modules are statically declared and compiled in. A registration mechanism needs to be created so they can live as separate modules.
  • Abstract and modularize the on-disk metadata. This will allow RF to work with different on-disk metadata formats, which will lead to nice things like HostRAID support with the Adaptec SCSI cards.
  • Write a disk concantination (aka 'Volume set') Raid module. This would be a great project to prove how modular and well documented RF is.
  • GEOM-ification. This is needed for autoconfiguration to work.

Status
2002-10-20
RAIDframe is now in the 5-CURRENT branch of FreeBSD. From now on, patches will probably only appear on this webpage if they are experimental. I will, however, maintain a TODO list here.

2002-10-06
Partial support for the new GEOM block layer in FreeBSD. Autoconfiguration is disabled until I adapt it. This will cause several warnings when compiling the code, but they are harmless. This also uses the alt kstack functionality that I recently added to FreeBSD, so there is no longer a need to set the KSTACK_PAGES option in your kernel.
NOTE:This patch also contains required patches to the GEOM code. Once applied, your kernel needs to be rebuilt. The GEOM patches will probably be checked into FreeBSD in the next few days, making the ones here obsolete.

2002-09-23
This drop fixes a number of issues:
  • Adapted for the recent disk layer changes. Generic disk ioctl behavior should also work better now.
  • Several locking fixes from NetBSD
  • Removed a dirty hack relating to malloc.
Also, Autoconfiguration seems to work as long as the disks to be configured contain an MBR.

2002-09-13
Finally, patches for FreeBSD 5.0-current. Read the following:
  • This is still highly unstable and a work in progress. It's stable enough for me to do some heavy I/O without panic'ing. However, it is far from release quality.
  • I have not tested autoconfiguration at all. It will probably panic.
  • I have only tested it as a loadable module, not compiled into the kernel. It will probably panic.
  • Error recovery is known to be broken. It will probably panic.
  • Unloading it as a module will probably panic.
  • This has not been tested with an SMP machine. It will probably panic.
  • You'll probably want to recompile your kernel with options KSTACK_PAGES=4. Not doing so will probably panic, especially when using SCSI drives.

2002-06-26
Yes, I am still working on bringing RAIDframe to FreeBSD.
While I do not yet have patches ready for public consumption, work has been progressing. Witness violations, deadlocks, and panics are being worked out, and I hope to release something by the July 4 break. In the mean time, I would happily accept donations of Ultra160 SCSI drives, so I can tax the transaction thoroughput of the code as much as possible. Please email me if you can help.

2001-08-28
Ok, I'm a bum and haven't been working on this lately. To make matters worse, the last set of diffs was not produced correctly. This update fixes that and resolves a minor conflict with the latest 4.4-RC. Thanks to Xia Tao for pointing this out.

2001-07-10
Another minor update. This removes the standard disk ioctls from the code, since that functionality is handled adequately in the disk layer. This also freshens up the diffs between other system files.

2001-06-20
Panic fix. Due to changes in the way that proc0 and init are started now, proc0 has no root or curdir vnode set. Since kthreads are children of proc0, they are missing this also. This was causing a panic when manually configuring an array, since namei() needs these vnodes. Auto-configure wasn't affected because it doesn't use namei().

2001-06-18
Minor update. A possible panic when configuring an array has hopefully been fixed (thanks to William Carrel for pointing this out). Also reworked the printing of debug information; setting RAID_DEBUG to a value between 0 and 3 will give you increasing levels of output, with 0 being the default value.

2001-06-15
Autoconfiguration works. You must be running -stable as of 2001-06-15 for this to work! Anything earlier will result in a panic when you boot and an array is marked for auto-configure. To mark an array for auto-configure, use raidctl -A yes raiddev. Also cleaned up the raidctl.8 manpage and did some more general clean-ups to the code.

2001-06-11
Added the raid.4 manpage. The driver is now auto-loaded when you run raidctl. It can also be statically compiled into the kernel now by adding the following line to your config:

pseudo-device raidframe

2001-06-07
The basic port, minus auto-config and root-mounting, is nearly complete. I consider it to be BETA quality, so don't try putting anything that you value on it (web server, PGP keys, recipes, nuclear launch codes, etc). Also, this code is based on 4.3-stable and will not compile under -current! I will port this over to -current in the next few days. Unless something else in -current blows up. Again.

Download
Download and install this patch at /usr.
2002-10-20-RAIDframe-current.diff.gz<=== FreeBSD 5-CURRENT

2001-08-28-RAIDframe-stable.diff.gz<=== FreeBSD 4-STABLE

How-to
Note:The following paragraphs were originally written for things as they applied to FreeBSD 4-stable. The ideas are the same for 5-current, though details are slightly different. I will update this later.
  1. You must update your kernel sources to 2001-06-15 of -stable or later, or else none of this will work. If you are only compiling the driver as a module, your kernel must be from 2001-06-15 or later.
  2. Go to /usr and apply the patch with the following command: gzcat patchfile | patch -p
  3. You may want to make world since a number of apps are affected by the changes to /sys/sys/disklabel.h. At the very least you should copy /sys/sys/disklabel.h to /usr/include/sys/disklabel.h and remake /usr/src/sbin/disklabel.
  4. You will also want to install /usr/src/etc/MAKEDEV into /dev/MAKEDEV and make the raidctl and raid* devices.
  5. Add the following lines to your kernel config file and compile:

    pseudo-device raidframe
    options RAID_AUTOCONFIG


  6. Alternatively, it can be compiled as a module. Go to /sys/modules/raidframe and build it with make && make install. Edit the Makefile if you want auto-configure turned on.
  7. Go to /usr/src/sbin/raidctl and do make && make install.
  8. The raidctl.8 manpage has an excellent description of how to set up an array. In short, you need to do the following things:
    1. Select the disks that you wish to experiment with and slice them appropriately with FreeBSD slices. Note that it is possible, but untested, to create multiple BSD slices per disk and have RAIDframe treat them as separate components.
    2. Disklabel the slices and create an equally sized partition of fstype raid for each slice.
    3. Follow the raidctl.8 manpage to construct a config file that names the component partitions and the array properties.
    4. Run raidctl -C config. A whole lot of information should spew out of the console about the array you just created. The fatal errors are ok to ignore as long as the console messages say that they were ignored. Note also that the console should mention that raid0 was created.
    5. Run raidctl -I 12345 raid0 to initalize the serial number on the array.
    6. Run raidctl -iv raid0 to initialize the array and rewrite the parity. This is neccesary even for RAID0 and RAID1 arrays.
    7. You should now be able to fdisk and label the /dev/raid0 device just like any other raw device.
  9. An array can be marked for auto-configuration by using raictl -A yes raid0.
  10. You can safely shutdown the array by running raidctl -u raid0 or by unloading the raidframe module.
Performance
Early performance tests show ~66MB/s reading and ~49MB/s writing through the filesystem on a 3 disk RAID 0 array comprised of Quantum Ultra160 drives. Feel free to submit your performance numbers, along with your dmesg output. The driver does provide stats to the devstat facility, so iostat can be used to monitor performance.

Credits
Much praise and many thanks to Greg Oster for his NetBSD port and his patience in answering my countless questions.
Thanks to William Carrel for helping me track down a nasty panic that occurs when manually configuring an array.