Note: the problems/solutions are described in terms of the conditions at the time the problem was addressed. Any advice for a solution may not be relevant for the -current source. In any case it should not be an issue for anyone running -current as the problem shouldn't exist anynore.



(Old) Kernel Problems


SMP kernel with APIC_IO deadlocks under heavy load.

During a page fault, it generally happens that at some point smp_invltlb() gets called to flush the TLB on the other CPU's. smp_invltlb() calls allButSelfIPI() and sends an IPI to the other processor, which, unfortunately, is sometimes already processing an interrupt of a higher priority. This interrupt routine now spends its time trying to obtain the mp_lock spin lock so it can enter the kernel, but the processor which has this lock is also in a spin loop in apicIPI() waiting for the IPI to be delivered. This is only a problem when the APIC fifo is already full, preventing delivery of additional INT messages.

Solution:

Perhaps we could release the mplock while sending an IPI, and try to grab it back again before continuing... Alternatively, have a timeout on the IPI, and if the apic hasn't recovered after a certain amount of time (ie: it's indefinately "busy"), then release the mplock for a moment and wait and check the status again before refetching the lock. If it still fails to recover, panic rather than hang forever..

My latest fix appears to work:


Systems with > 1 PCI bus may fail.

The MP spec appendix D.2 specifies that when MORE than 1 PCI bus exists the PCI busses should be assigned IDs first, using actual PCI bus numbers. Then ids are assigned to other busses using whatever numbers are free.

BIOS with ONLY ONE PCI bus on the MOTHERBOARD often list the ISA/EISA bus first, thus makeing the one and only PCI bus have ID 1. When a PCI bridge card exists in the system another PCI bus then exists, causing the PCI bus numbering (ie ID #s) to be incorrect. Even when no such card exists, having the ISA bus first causes the PCI bus id to NOT match the actual PCI bus #. This makes identifying PCI bus:device:int info from the MP table INTs section difficult (prone to error). To work around this problem the current code ignores the bus ID when identifying PCI INT associations, causing possible errors with multiple PCI bus systems.

Solution:

See MP spec v1.4, D.2 & D.3 for details on required format.


Setting smp_active to '0' after starting other CPUs crashes system.

When smp_active == 0 all the lock code is ignored. If you start the additional CPU(s) via sysctl, then attempt to stop by setting smp_active to '0' (instead of the proper value of '1') multiple CPUs enter the kernel at the same time, causing random corruption and crashes.

Solution:

added a SYSCTL_PROC to limit the range of legal values.


static (ROM based) MP tables and PCI bridge cards.

Some(most?) BIOS have a hard-coded MP table in ROM, meaning that bridged PCI cards and other complex PCI setups will NOT be properly described in the MP table. As such, insufficient info exists in the table to properly setup the APIC for these cards.

Solution:

See the page on bridged PCI cards for details of a workaround. The long term solution will be a pass where we build a MP table incore based on information received from the PCI sub-system, ie, we will toss certain sections of the motherboard provided MP table!


PnP boards can cause problems as I have not yet implimented code to "undirect" PCI INTs when sent to the APIC. Since an upper ( >15 ) INT is registered in place of the original ( <=15 ) the original may be assigned later by a PnP card. BUT the lower INT line is NOT un-redirected. This means both the PCI hardware (via the redirect hardware) AND the PnP card are yanking on the same lower INT line, causing random problems.

Solution:



(Old) UserLand Code Problems


dset probably fails, isa_device.h:id_irq size: u_int vs. u_short.

Solution:

isa_device.h:id_irq promoted to u_int for both UP and SMP kernels.



Hardware Setup Problems


BIOS must disable 2nd IO APIC if present.

Some machines have a 2nd IO APIC available. If it is enabled in the BIOS the mptable will have references to it which confuse the SMP kernel.

Solution:

FIXED in 4.0/5.0 as of 4.21.2000.


motherboard MP table declares 8254 timer to be connected to the IO APIC, but it actually isn't, causing a kernel lockup.

On some motherboards the 8254 timer ouput is not connected to the IO APIC. If the MP table correctly shows this fact by lack of an INT entry for the 8254 timer the kernel will enable "mixed-mode" programming, getting around the problem. However not all MP tables do so. Instead they declare that the 8254 timer is connected, usually to INT2. This prevents the kernel from taking corrective action, and a kernel lockup results.

Solution:

FIXED.

A kernel option to override the offending MP table entry is available, see this page for details.