next up previous
Next: Tricky Problems Revisited Up: Multiple Passes of the Previous: Some Tricky Problems

Subsections

Multiple-Pass Proposal

To aid with solving these problems, I propose extending the ``new-bus'' framework to perform multiple passes over the device tree during boot. Device driver attachments would now be assigned a ``pass'' value. A driver is only invoked to probe devices once the system-wide pass level is greater than or equal to the attachment's pass level. A driver's pass level is tied to a specific attachment. If a driver attaches to multiple buses, then it may have different pass levels for each attachment. This allows for stronger ordering of drivers with respect to each other across the device tree. It also allows for the kernel to perform work other than attaching drivers with a partially-attached tree.

The current implementation defers probing the majority of drivers until the final pass during boot. These drivers require no code changes. Drivers that wish to probe during an earlier pass do require changes.

Changes to ``new-bus'' Infrastructure

The changes to ``new-bus'' itself to support multiple passes are relatively minor. A few places have to be adjusted to ignore drivers whose pass level is greater than the current system-wide pass level. A facility for raising the pass level and triggering scans of the device tree for new pass levels is also required.

Special Handling of Driver Methods

A few places where ``new-bus'' invokes device driver methods must skip drivers with a pass level greater than the current system pass level. Most device driver methods are only invoked once a driver has probed and attached to a device. If device drivers are prevented from probing devices too early, then those methods do not need special handling. In fact, the only methods which must be handled specially are DEVICE_IDENTIFY and DEVICE_PROBE. Specifically, bus_generic_probe only invokes DEVICE_IDENTIFY for driver attachments whose pass level is less than or equal to the current pass. Similarly, device_probe only tries driver attachments whose pass level is less than or equal to the current pass. This has the effect that device drivers may always use bus_generic_probe and bus_generic_attach to attach children devices as they do now.

One bus method also requires special treatment. The BUS_PROBE_NOMATCH method is called for devices that are not probed by any drivers during boot. Most devices will not be probed by any drivers during early passes, however. This would result in many spurious calls of BUS_PROBE_NOMATCH. The solution is to change device_probe to only invoke this method during the final pass.

Managing Pass Levels

Most of the changes to ``new-bus'' provide management of pass levels. These changes include tracking the passes used by drivers, raising the pass level, and rescanning the device tree.

One of the goals of this implementation is to support a dynamic set of sparse pass levels. The interface should be similar to the subsystem levels used with SYSINIT() [6]. Adding a new driver with a new pass level to the system should cause ``new-bus'' to rescan the tree for that pass.

A simple implementation would be to rescan the tree for every possible pass level during boot from zero to INT_MAX. However, it is expected that there will normally be very few active pass levels, so the vast majority of these tree scans would be wasted effort. Instead, a new list of active pass levels is maintained. The list is sorted by pass level and contains one driver per pass level. When a new driver is registered with the system during boot, that driver is added to the list if it uses a previously-unused pass level. This provides a quick and easy way to enumerate the pass levels in use.

The current system pass level is stored in a new global variable bus_current_pass. The system pass level can be raised to a specific value by calling the new bus_set_pass function. The requested pass level does not have to be used by any drivers, but lowering the pass level is not permitted.

The bus_set_pass function may invoke multiple scans of the tree during a single call. It walks the list of active pass levels until it either hits the end of the list or encounters a pass level higher than the requested level. Any pass levels less than the current system pass level are skipped. The remaining pass levels each trigger a separate scan of the device tree.

Rescanning the device tree is implemented by a new bus method BUS_NEW_PASS. The bus_set_pass function invokes this method on the root_bus device each time the pass level is raised. A default implementation is provided by the new bus_generic_new_pass function. It first walks all the driver attachments for the current device. If any of the attachments use the new pass level, then their DEVICE_IDENTIFY method is invoked. After this is completed, it walks the list of children devices. If a child device has an attached driver, then the driver's BUS_NEW_PASS method is invoked. Otherwise, the device is reprobed. This allows drivers that were made eligible for probing by the new pass to now probe the device.

Writing an Early Pass Driver

In a system with multiple passes of the device tree, the majority of existing drivers will only probe devices during the final pass. These drivers do not need any modifications. Drivers that do wish to probe devices during an earlier pass do require modifications, however.

All early drivers are required to indicate the earliest pass at which they are eligible to probe devices. This is accomplished by a new EARLY_DRIVER_MODULE macro. This macro is similar to the existing DRIVER_MODULE macro. It simply adds a new argument to specify the pass level of the driver attachment. For some drivers this is the only modification needed.

Bus drivers may need additional changes to their attach routines. Specifically, many of the tasks bus drivers currently preform in their attach routines may need to be deferred until a specific pass has completed. For example, PCI bus drivers should not attempt to route interrupts for child devices until after the pass which adds interrupt controller drivers is completed. Buses should also not assign resources to devices until system resource drivers have attached and reserved resources that other devices should not use.

One possible method for addressing this is for bus drivers to provide a custom method for BUS_NEW_PASS. This custom method would perform actions dependant on an earlier pass level the first time the pass level is raised to a greater level. This would be a bit clunky and require bus drivers to keep track of which initialization steps had already been performed, however. It is also not very intuitive that one cannot simply ``hook'' into an existing pass level. Instead, the bus driver needs to be certain that an entire pass has completed before it performs actions that depend on that pass. This requires the actions to be deferred until the next pass level instead.

Another approach would be to add specific event notification methods to the bus interface. For example, a new BUS_ASSIGN_RESOURCES method would be invoked at the top-level when the system was ready for buses to assign resources to child devices. One downside of this method is that bus drivers would be required to explicitly pass the notifications down to continue the tree scans. This could be somewhat mitigated by providing default implementations similar to bus_generic_new_pass.

If a bus driver can be attached after boot, then any changes made to its attach routine will need to take this into account. It can do this by conditionally performing tasks such as resource allocation in its attach routine based on the current system pass level.

Finally, if a bus driver uses hints to enumerate children devices it may need to defer adding hinted children. If all of the bus's child devices are enumerated by hints, then no changes are needed. However, if the bus supports a mixture of self-enumerated devices and hint-enumerated devices and allows self-enumerated devices to claim hint devices via bus_hint_device_unit, then special care must be taken to not add hint-enumerated devices until after all of the self-enumerated devices have been probed during the final pass. The only driver that currently has to deal with this is the ISA bus driver on the amd64 and i386 platforms.

Pass Levels

The multiple pass system is designed to easily allow new pass levels to be added. At a minimum it requires two pass levels to be present, the initial pass level used to attach the root_bus device and the default pass level used by most drivers. However, for the system to be useful additional pass levels must be used. A list of possible pass levels in increasing order follows. Each level is named by a constant present in <sys/bus.h>.

The BUS_PASS_ROOT pass level is level 0 and is reserved as a marker for the root_bus device. The root bus driver does not probe and attach normally, so it does not have an actual pass number assigned. Instead, ``new-bus'' creates the device and assigns the driver manually to provide a starting point for the device tree. Pass level 0 is similarly special in that it is the initial system pass level. No drivers should use this pass level.

The BUS_PASS_BUS pass level is used by bus and bridge drivers. These drivers are responsible for populating the device tree. Note that other early drivers need device nodes to probe and attach to, so this is a prerequisite for other early drivers.

The BUS_PASS_CPU pass level is used to create devices for CPUs. Note that devices that attach to CPUs such as cpufreq(4) drivers may probe later. This is simply a continuation of the previous level to fully enumerate the device tree.

The BUS_PASS_RESOURCE pass level is used by drivers that need to probe before resources are assigned to devices. System resource drivers would attach at this pass. Once this pass is complete, bus drivers may assign non-interrupt resources to devices.

The BUS_PASS_INTERRUPT pass level is used by drivers that provide interrupt support services. Interrupt controllers and PCI interrupt routers would attach during this pass. Once this pass is complete, bus drivers may assign interrupt resources to devices.

The BUS_PASS_TIMER pass level is used by drivers that implement any timers needed to drive the thread scheduler. Clock drivers would attach during this pass.

The BUS_PASS_SCHEDULER pass level is used to indicate the point at which the thread scheduler is started. Any other devices not already probed that are required for the thread scheduler would attach during this pass. Once this pass is complete, the thread scheduler could be started.

Finally, the BUS_PASS_DEFAULT pass level is the final pass level. It has a value of INT_MAX. All devices not already probed by an earlier pass would attach during this pass if a suitable driver is available.


next up previous
Next: Tricky Problems Revisited Up: Multiple Passes of the Previous: Some Tricky Problems