Note: the problems/solutions are described in terms of the conditions at the time the problem was addressed. Any advice for a solution may not be relevant for the -current source. In any case it should not be an issue for anyone running -current as the problem shouldn't exist anynore.
During a page fault, it generally happens that at some point smp_invltlb() gets called to flush the TLB on the other CPU's. smp_invltlb() calls allButSelfIPI() and sends an IPI to the other processor, which, unfortunately, is sometimes already processing an interrupt of a higher priority. This interrupt routine now spends its time trying to obtain the mp_lock spin lock so it can enter the kernel, but the processor which has this lock is also in a spin loop in apicIPI() waiting for the IPI to be delivered. This is only a problem when the APIC fifo is already full, preventing delivery of additional INT messages.
Perhaps we could release the mplock while sending an IPI, and try to grab it back again before continuing... Alternatively, have a timeout on the IPI, and if the apic hasn't recovered after a certain amount of time (ie: it's indefinately "busy"), then release the mplock for a moment and wait and check the status again before refetching the lock. If it still fails to recover, panic rather than hang forever..
My latest fix appears to work:
The MP spec appendix D.2 specifies that when MORE than 1 PCI bus exists the PCI busses should be assigned IDs first, using actual PCI bus numbers. Then ids are assigned to other busses using whatever numbers are free.
BIOS with ONLY ONE PCI bus on the MOTHERBOARD often list the ISA/EISA bus first, thus makeing the one and only PCI bus have ID 1. When a PCI bridge card exists in the system another PCI bus then exists, causing the PCI bus numbering (ie ID #s) to be incorrect. Even when no such card exists, having the ISA bus first causes the PCI bus id to NOT match the actual PCI bus #. This makes identifying PCI bus:device:int info from the MP table INTs section difficult (prone to error). To work around this problem the current code ignores the bus ID when identifying PCI INT associations, causing possible errors with multiple PCI bus systems.
See MP spec v1.4, D.2 & D.3 for details on required format.
When smp_active == 0 all the lock code is ignored. If you start the additional CPU(s) via sysctl, then attempt to stop by setting smp_active to '0' (instead of the proper value of '1') multiple CPUs enter the kernel at the same time, causing random corruption and crashes.
added a SYSCTL_PROC to limit the range of legal values.
Some(most?) BIOS have a hard-coded MP table in ROM, meaning that bridged PCI cards and other complex PCI setups will NOT be properly described in the MP table. As such, insufficient info exists in the table to properly setup the APIC for these cards.
See the page on bridged PCI cards for details of a workaround. The long term solution will be a pass where we build a MP table incore based on information received from the PCI sub-system, ie, we will toss certain sections of the motherboard provided MP table!
isa_device.h:id_irq promoted to u_int for both UP and SMP kernels.
Some machines have a 2nd IO APIC available. If it is enabled in the BIOS the mptable will have references to it which confuse the SMP kernel.
FIXED in 4.0/5.0 as of 4.21.2000.
On some motherboards the 8254 timer ouput is not connected to the IO APIC. If the MP table correctly shows this fact by lack of an INT entry for the 8254 timer the kernel will enable "mixed-mode" programming, getting around the problem. However not all MP tables do so. Instead they declare that the 8254 timer is connected, usually to INT2. This prevents the kernel from taking corrective action, and a kernel lockup results.
A kernel option to override the offending MP table entry is available, see this page for details.