Index: man4/geom.4 =================================================================== RCS file: /home/ncvs/src/share/man/man4/geom.4,v retrieving revision 1.9 diff -u -r1.9 geom.4 --- man4/geom.4 21 May 2003 15:55:40 -0000 1.9 +++ man4/geom.4 18 Jun 2003 00:16:02 -0000 @@ -41,28 +41,40 @@ .Nm GEOM .Nd modular disk I/O request transformation framework. .Sh DESCRIPTION -The GEOM framework provides an infrastructure in which "classes" +The +.Nm +framework provides an infrastructure in which +.Dq classes can perform transformations on disk I/O requests on their path from -the upper kernel to the device drivers and back. +the upper half of the kernel to the device drivers and back. .Pp -Transformations in a GEOM context range from the simple geometric +Transformations in a +.Nm +context range from the simple geometric displacement performed in typical disk partitioning modules over RAID algorithms and device multipath resolution to full blown cryptographic protection of the stored data. .Pp -Compared to traditional "volume management", GEOM differs from most +Compared to traditional +.Dq volume management , +.Nm +differs from most and in some cases all previous implementations in the following ways: .Bl -bullet .It -GEOM is extensible. It is trivially simple to write a new class -of transformation and it will not be given stepchild treatment. If -someone for some reason wanted to mount IBM MVS diskpacks, a class +.Nm +is extensible. +It is trivially simple to write a new class +of transformation and it will not be given step-child treatment. +If someone for some reason wanted to mount IBM MVS diskpacks, a class recognizing and configuring their VTOC information would be a trivial matter. .It -GEOM is topologically agnostic. Most volume management implementations +.Nm +is topologically agnostic. +Most volume management implementations have very strict notions of how classes can fit together, very often -one fixed hierarchy is provided for instance subdisk - plex - +one fixed hierarchy is provided for instance, subdisk - plex - volume. .El .Pp @@ -75,28 +87,45 @@ physical disks and then partition the mirror into subdisks, instead one is forced to make subdisks on the physical volumes and to mirror these two and two resulting in a much more complex configuration. -GEOM on the other hand does not care in which order things are done, +.Nm +on the other hand does not care in which order things are done, the only restriction is that cycles in the graph will not be allowed. .Pp .Sh "TERMINOLOGY and TOPOLOGY" -Geom is quite object oriented and consequently the terminology +.Nm +is quite object oriented and consequently the terminology borrows a lot of context and semantics from the OO vocabulary: .Pp -A "class", represented by the data structure g_class implements one -particular kind of transformation. Typical examples are MBR disk -partition, BSD disklabel, and RAID5 classes. -.Pp -An instance of a class is called a "geom" and represented by the -data structure "g_geom". In a typical i386 FreeBSD system, there +A +.Dq class , +represented by the data structure +.Vt g_class +implements one particular kind of transformation. +Typical examples are MBR disk partition, BSD disk label, and RAID5 classes. +.Pp +An instance of a class is called a +.Dq geom +and represented by the data structure +.Vt g_geom . +In a typical i386 FreeBSD system, there will be one geom of class MBR for each disk. .Pp -A "provider", represented by the data structure "g_provider", is -the front gate at which a geom offers service. +A +.Dq provider , +represented by the data structure +.Vt g_provider , +is the front gate at which a geom offers service. A provider is "a disk-like thing which appears in /dev" - a logical disk in other words. -All providers have three main properties: name, sectorsize and size. -.Pp -A "consumer" is the backdoor through which a geom connects to another +All providers have three main properties: +.Va name , +.Va sectorsize +and +.Va size . +.Pp +A +.Dq consumer +is the backdoor through which a geom connects to another geom provider and through which I/O requests are sent. .Pp The topological relationship between these entities are as follows: @@ -116,19 +145,20 @@ .El .Pp All geoms have a rank-number assigned, which is used to detect and -prevent loops in the acyclic directed graph. This rank number is -assigned as follows: +prevent oops in the acyclic directed graph. +This rank number is assigned as follows: .Bl -enum .It -A geom with no attached consumers has rank=1 +A geom with no attached consumers has a rank of +.Ql 1 . .It A geom with attached consumers has a rank one higher than the -highest rank of the geoms of the providers its consumers are -attached to. +highest rank of the geoms of the providers to which its consumers are +attached. .El .Sh "SPECIAL TOPOLOGICAL MANEUVERS" In addition to the straightforward attach, which attaches a consumer -to a provider, and detach, which breaks the bond, a number of special +to a provider and detach, which breaks the bond, a number of special topological maneuvers exists to facilitate configuration and to improve the overall flexibility. .Pp @@ -144,7 +174,9 @@ provider will be offered to all classes in turn. .Pp Exactly what a class does to recognize if it should accept the offered -provider is not defined by GEOM, but the sensible set of options are: +provider is not defined by +.Nm , +but the sensible set of options are: .Bl -bullet .It Examine specific data structures on the disk. @@ -161,15 +193,16 @@ it potentially is still being used. .Pp When a geom orphans a provider, all future I/O requests will -"bounce" on the provider with an error code set by the geom. Any -consumers attached to the provider will receive notification about -the orphanization when the eventloop gets around to it, and they +.Dq bounce +on the provider with an error code set by the geom. +Any consumers attached to the provider will receive notification about +the orphanization when the event loop gets around to it and they can take appropriate action at that time. .Pp -A geom which came into being as a result of a normal taste operation -should selfdestruct unless it has a way to keep functioning lacking +A geom which came into existence, being as a result of a normal taste +operation should self-destruct unless it has a way to keep functioning lacking the orphaned provider. -Geoms like diskslicers should therefore selfdestruct whereas +Geoms like disk slicers should therefore self-destruct whereas RAID5 or mirror geoms will be able to continue, as long as they do not loose quorum. .Pp @@ -181,7 +214,7 @@ The typical scenario is .Bl -bullet -offset indent -compact .It -A device driver detects a disk has departed and orphans the provider for it. +A device driver detects that a disk has departed and orphans the provider for it. .It The geoms on top of the disk receive the orphanization event and orphans all their providers in turn. @@ -190,11 +223,13 @@ This process continues in a quasi-recursive fashion until all relevant pieces of the tree has heard the bad news. .It -Eventually the buck stops when it reaches geom_dev at the top -of the stack. -.It -Geom_dev will call destroy_dev(9) to stop any more request from -coming in. +Eventually the buck stops when it reaches +.Em geom_dev +at the top of the stack. +.It +Geom_dev will call +.Xr destroy_dev 9 +to stop any more requests from coming in. It will sleep until all (if any) outstanding I/O requests have been returned. It will explicitly close (ie: zero the access counts), a change @@ -221,19 +256,35 @@ It is probably easiest to understand spoiling by going through an example. .Pp -Imagine a disk, "da0" on top of which a MBR geom provides -"da0s1" and "da0s2" and on top of "da0s1" a BSD geom provides -"da0s1a" through "da0s1e", both the MBR and BSD geoms have +Imagine a disk, +.Pa da0 +on top of which a MBR geom provides +.Pa da0s1 +and +.Pa da0s2 +and on top of +.Pa da0s1 +a BSD geom provides +.Pa da0s1a +through +.Pa da0s1e , +both the MBR and BSD geoms have autoconfigured based on data structures on the disk media. -Now imagine the case where "da0" is opened for writing and those -data structures are modified or overwritten: Now the geoms would +Now imagine the case where +.Pa da0 +is opened for writing and those +data structures are modified or overwritten: now the geoms would be operating on stale metadata unless some notification system can inform them otherwise. .Pp -To avoid this situation, when the open of "da0" for write happens, +To avoid this situation, when the open of +.Pa da0 +for write happens, all attached consumers are told about this, and geoms like -MBR and BSD will selfdestruct as a result. -When "da0" is closed again, it will be offered for tasting again +MBR and BSD will self-destruct as a result. +When +.Pa da0 +is closed again, it will be offered for tasting again and if the data structures for MBR and BSD are still there, new geoms will instantiate themselves anew. .Pp @@ -241,15 +292,19 @@ .Pp If any of the paths through the MBR or BSD module were open, they would have opened downwards with an exclusive bit rendering it -impossible to open "da0" for writing in that case and conversely +impossible to open +.Pa da0 +for writing in that case and conversely the requested exclusive bit would render it impossible to open a -path through the MBR geom while "da0" is open for writing. +path through the MBR geom while +.Pa da0 +is open for writing. .Pp From this it also follows that changing the size of open geoms can only be done with their cooperation. .Pp Finally: the spoiling only happens when the write count goes from -zero to non-zero and the retasting only when the write count goes +zero to non-zero and re-tasting only when the write count goes from non-zero to zero. .Pp .Em INSERT/DELETE @@ -257,7 +312,7 @@ to be instantiated between a consumer and a provider attached to each other and to remove it again. .Pp -To understand the utility of this, imagine a provider with +To understand the utility of this, imagine a provider which is being mounted as a file system. Between the DEVFS geoms consumer and its provider we insert a mirror module which configures itself with one mirror @@ -269,34 +324,42 @@ We have now in essence moved a mounted file system from one disk to another while it was being used. At this point the mirror geom can be deleted from the path -again, it has served its purpose. +again, as it has served its purpose. .Pp .Em CONFIGURE is the process where the administrator issues instructions -for a particular class to instantiate itself. There are multiple +for a particular class to instantiate itself. +There are multiple ways to express intent in this case, a particular provider can be specified with a level of override forcing for instance a BSD disklabel module to attach to a provider which was not found palatable during the TASTE operation. .Pp -Finally IO is the reason we even do this: it concerns itself with +Finally, I/O is the reason we even do this: it concerns itself with sending I/O requests through the graph. .Pp -.Em "I/O REQUESTS -represented by struct bio, originate at a consumer, +.Em I/O REQUESTS +represented by +.Vt struct bio , +originate at a consumer, are scheduled on its attached provider, and when processed, returned to the consumer. -It is important to realize that the struct bio which -enters through the provider of a particular geom does not "come -out on the other side". +It is important to realize that the +.Vt struct bio +which +enters through the provider of a particular geom does not +.Dq come out on the other side . Even simple transformations like MBR and BSD will clone the -struct bio, modify the clone, and schedule the clone on their -own consumer. -Note that cloning the struct bio does not involve cloning the -actual data area specified in the IO request. -.Pp -In total four different IO requests exist in GEOM: read, write, -delete, and get attribute. +.Vt struct bio , +modify the clone, and schedule the clone on their own consumer. +Note that cloning the +.Vt struct bio +does not involve cloning the +actual data area specified in the I/O request. +.Pp +In total four different I/O requests exist in +.Nm : +read, write, delete, and get attribute. .Pp Read and write are self explanatory. .Pp @@ -311,24 +374,42 @@ It is important to recognize that a delete indication is not a request and consequently there is no guarantee that the data actually will be erased or made unavailable unless guaranteed by specific -geoms in the graph. If "secure delete" semantics are required, a +geoms in the graph. +If +.Dq secure delete +semantics are required, a geom should be pushed which converts delete indications into (a -sequence of) write requests. +a sequence of) write requests. .Pp Get attribute supports inspection and manipulation of out-of-band attributes on a particular provider or path. -Attributes are named by ascii strings and they will be discussed in +Attributes are named by ASCII strings and they will be discussed in a separate section below. .Pp (stay tuned while the author rests his brain and fingers: more to come.) +.Sh SEE ALSO +.Xr destroy_dev 9 , +.Xr device 9 .Sh HISTORY -This software was developed for the FreeBSD Project by Poul-Henning Kamp +This software was developed for the FreeBSD Project by +.An Poul-Henning Kamp and NAI Labs, the Security Research Division of Network Associates, Inc. under DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the DARPA CHATS research program. .Pp -The first precursor for GEOM was a gruesome hack to Minix 1.2 and was -never distributed. An earlier attempt to implement a less general scheme -in FreeBSD never succeeded. +The +.Nm +framework first appeared in +.Fx 5.0 +.Pp +The first precursor for +.Nm +was a gruesome hack to Minix 1.2 and was +never distributed. +An earlier attempt to implement a less general scheme +in +.Fx +never succeeded. .Sh AUTHORS +This manual page was written by .An "Poul-Henning Kamp" Aq phk@FreeBSD.org