RAIDframe
Intro
TODO
Status
Download
How-to
Performance
Credits
Intro
This project brings RAIDframe
to FreeBSD, with some extras. It is based on the
NetBSD RAIDframe port
by Greg Oster which in turn is based on
the RAID research and prototyping tool developed by the
Parallel Data Laboratory at Carnegie
Mellon University. This port, and the NetBSD port, are based on RAIDframe
version 1.1 and provide the following features:
- Threaded RAID engine contained within the kernel.
- Fast, solid, and tested performance that is well-suited for a production environment.
- Well documented and flexible framework that is ideal for RAID research and
prototyping.
- Large number of RAID features, like RAID 0/1/4/5/6, hot spares, parity
logging, etc. Through the configuration utility, spares can be hot added,
components can be failed, reconstruction started/monitored, and parity
regenerated.
- Autoconfiguration of arrays at boot time, and root mounting of these arrays.
- Independence from the lower level device. Arrays can be constructed out of
any block device, whether it's IDE, SCSI, or another array.
The code is relatively unchanged from NetBSD with the exception of the
kernel interface, which has undergone some major work. Raid devices are now
dynamically created through a raid control device, eliminating the compile-time
constraints present in NetBSD.
TODO
- Stability. No testing has been done under SMP in the 5-CURRENT branch.
The original 4-STABLE work was developed exclusively on an SMP machine, but
the differences between 4.x and 5.x are huge. Error recovery also has not
been extensively tested.
- Push DAG selection to the kthread, or pre-allocate DAG selection resources.
In the current code, the strategy() path leads to code that selects the
correct DAG for the operation. Unfortunately, it does several mallocs
during this process. If the mallocs cannot be avoided, then this code needs
to be pushed to the kthread.
- Rework the locking semantics. A very pervasive problem in RAIDframe is
that mutexes are used to serialize code paths, not protect data structures.
The code needs a good audit to change mutexes to SX locks where appropriate,
or rework the locking in general.
- Unify and clean up the debug printing. It's a mess. Several entirely
different mechanisms exist. NetBSD worked on this a while back, so maybe
that work can be leveraged.
- Make the modular parts of the system modular. The Queuing policy and
Raid modules are statically declared and compiled in. A registration
mechanism needs to be created so they can live as separate modules.
- Abstract and modularize the on-disk metadata. This will allow RF to
work with different on-disk metadata formats, which will lead to nice things
like HostRAID support with the Adaptec SCSI cards.
- Write a disk concantination (aka 'Volume set') Raid module. This would
be a great project to prove how modular and well documented RF is.
- GEOM-ification. This is needed for autoconfiguration to work.
Status
2002-10-20
RAIDframe is now in the 5-CURRENT branch of
FreeBSD.
From now on, patches will probably only appear on this webpage if they are
experimental. I will, however, maintain a TODO list here.
2002-10-06
Partial support for the new GEOM block layer in FreeBSD. Autoconfiguration is
disabled until I adapt it. This will cause several warnings when compiling the
code, but they are harmless. This also uses the alt kstack functionality that
I recently added to FreeBSD, so there is no longer a need to set the
KSTACK_PAGES option in your kernel.
NOTE:This patch also contains required patches to the
GEOM code. Once applied, your kernel needs to be rebuilt. The GEOM patches
will probably be checked into FreeBSD in the next few days, making the ones
here obsolete.
2002-09-23
This drop fixes a number of issues:
- Adapted for the recent disk layer changes. Generic disk ioctl behavior
should also work better now.
- Several locking fixes from NetBSD
- Removed a dirty hack relating to malloc.
Also, Autoconfiguration seems to work as long as the disks to be configured
contain an MBR.
2002-09-13
Finally, patches for FreeBSD 5.0-current. Read the following:
- This is still highly unstable and a work in progress. It's stable enough
for me to do some heavy I/O without panic'ing. However, it is far from
release quality.
- I have not tested autoconfiguration at all. It will probably panic.
- I have only tested it as a loadable module, not compiled into the kernel.
It will probably panic.
- Error recovery is known to be broken. It will probably panic.
- Unloading it as a module will probably panic.
- This has not been tested with an SMP machine. It will probably panic.
- You'll probably want to recompile your kernel with
options KSTACK_PAGES=4
.
Not doing so will probably panic, especially when using SCSI drives.
2002-06-26
Yes, I am still working on bringing
RAIDframe to FreeBSD.
While I do not yet have patches ready for public consumption, work has
been progressing. Witness violations, deadlocks, and panics are being
worked out, and I hope to release something by the July 4 break. In the
mean time, I would happily accept donations of Ultra160 SCSI drives, so
I can tax the transaction thoroughput of the code as much as possible.
Please email me if you can help.
2001-08-28
Ok, I'm a bum and haven't been working on this lately. To make matters worse,
the last set of diffs was not produced correctly. This update fixes that
and resolves a minor conflict with the latest 4.4-RC. Thanks to Xia Tao
for pointing this out.
2001-07-10
Another minor update. This removes the standard disk ioctls from the code,
since that functionality is handled adequately in the disk layer. This also
freshens up the diffs between other system files.
2001-06-20
Panic fix. Due to changes in the way that proc0 and init are started now,
proc0 has no root or curdir vnode set. Since kthreads are children of proc0,
they are missing this also. This was causing a panic when manually configuring
an array, since namei() needs these vnodes. Auto-configure wasn't affected
because it doesn't use namei().
2001-06-18
Minor update. A possible panic when configuring an array has hopefully been
fixed (thanks to William Carrel for pointing this out). Also reworked the
printing of debug information; setting RAID_DEBUG to a value between 0 and 3
will give you increasing levels of output, with 0 being the default value.
2001-06-15
Autoconfiguration works. You must be running -stable as of 2001-06-15 for
this to work! Anything earlier will result in a panic when you boot and
an array is marked for auto-configure. To mark an array for auto-configure,
use raidctl -A yes raiddev
.
Also cleaned up the raidctl.8
manpage and did
some more general clean-ups to the code.
2001-06-11
Added the raid.4 manpage. The driver is now auto-loaded when you run
raidctl
. It can also be statically compiled into the
kernel now by adding the following line to your config:
pseudo-device raidframe
2001-06-07
The basic port, minus auto-config and root-mounting, is nearly complete.
I consider it to be BETA quality, so don't try putting anything that you value
on it (web server, PGP keys, recipes, nuclear launch codes, etc). Also,
this code is based on 4.3-stable and will not compile under
-current! I will port this over to -current in the next few days.
Unless something else in -current blows up. Again.
Download
Download and install this patch at /usr.
2002-10-20-RAIDframe-current.diff.gz<=== FreeBSD 5-CURRENT
2001-08-28-RAIDframe-stable.diff.gz<=== FreeBSD 4-STABLE
How-to
Note:The following paragraphs were originally written for things as
they applied to FreeBSD 4-stable. The ideas are the same for 5-current,
though details are slightly different. I will update this later.
- You must update your kernel sources to 2001-06-15 of -stable or later, or
else none of this will work. If you are only compiling the driver as a
module, your kernel must be from 2001-06-15 or later.
- Go to /usr and apply the patch with the following command:
gzcat
patchfile | patch -p
- You may want to
make world
since
a number of apps are affected by the changes to /sys/sys/disklabel.h. At the
very least you should copy /sys/sys/disklabel.h to /usr/include/sys/disklabel.h
and remake /usr/src/sbin/disklabel.
- You will also want to install /usr/src/etc/MAKEDEV into /dev/MAKEDEV and
make the raidctl and raid* devices.
- Add the following lines to your kernel config file and compile:
pseudo-device raidframe
options RAID_AUTOCONFIG
- Alternatively, it can be compiled as a module. Go to /sys/modules/raidframe
and build it with
make && make install
. Edit the Makefile if you
want auto-configure turned on.
- Go to /usr/src/sbin/raidctl and do
make && make install
.
- The
raidctl.8
manpage has an excellent description of how to
set up an array. In short, you need to do the following things:
- Select the disks that you wish to experiment with and slice them
appropriately with FreeBSD slices. Note that it is possible, but untested,
to create multiple BSD slices per disk and have RAIDframe treat them as
separate components.
- Disklabel the slices and create an equally sized partition of fstype
raid
for each slice.
- Follow the
raidctl.8
manpage to construct a config file that
names the component partitions and the array properties.
- Run
raidctl -C config
. A whole lot of information
should spew out of the console about the array you just created. The fatal
errors are ok to ignore as long as the console messages say that they were
ignored. Note also that the console should mention that raid0 was
created.
- Run
raidctl -I 12345 raid0
to initalize the serial number
on the array.
- Run
raidctl -iv raid0
to initialize the array and rewrite the
parity. This is neccesary even for RAID0 and RAID1 arrays.
- You should now be able to fdisk and label the /dev/raid0 device just
like any other raw device.
- An array can be marked for auto-configuration by using
raictl -A yes
raid0
.
- You can safely shutdown the array by running
raidctl -u raid0
or by unloading the raidframe module.
Performance
Early performance tests show ~66MB/s reading and ~49MB/s writing through the
filesystem on
a 3 disk RAID 0 array comprised of Quantum Ultra160 drives. Feel free to
submit your performance numbers, along with your dmesg
output. The driver does provide stats to the devstat facility, so
iostat
can be used to monitor performance.
Credits
Much praise and many thanks to Greg Oster
for his NetBSD port and his patience in answering my countless questions.
Thanks to William Carrel for helping me track down a nasty panic that occurs
when manually configuring an array.