Comment 2 for bug 1797990

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

One problem faced during this approach was that the early-quirks code in x86 performs a recursive search in the PCI bus descending from the "first" bus 0000:00, and walking through all secondary busses by jumping between bridges. For historical perspective about this code's evolution, see [0].

This is not enough in multi-processor systems, which may have multiple PCIe root complexes, exposing many root ports and so describing multiple hierarchy domains. The PCIe spec even doesn't guarantee those hierarchies are capable of communicating; from PCIe spec 3.0, section 1.3.1: "[...] The capability to route peer-to-peer transactions between hierarchy domains through a Root
Complex is optional and implementation dependent. For example, an implementation may
incorporate a real or virtual Switch internally within the Root Complex to enable full peer-to-
peer support in a software transparent way."

Usually we don't see PCI devices unable to communicate to each other if they are under different host bridges (aka root complexes in PCIe terminology). But from a software perspective, what Linux sees are multiple PCI devices organized in a tree way. The naive recursion from check_dev_quirk() in arch/x86 can't reach all root complexes starting always from bus 0000:00.

To exemplify how this tree would look like with a single or with multi root bridges, we'll attach outputs of "lspci -t" for 2 system next.
That said, we needed to change the bus scanning process to be comprehensive and walk through all buses. Good references for multi-root-complex PCIe BIOS probe (like its numbering rationale), [1] and [2].

[0] The early PCI scan dates back to BitKeeper, added by Andi Kleen's "[PATCH] APIC fixes for x86-64", on October/2003. It initially restricted the search to the first 32 busses and slots. Due to a potential bug found in Nvidia chipsets, the scan was changed to run only in the first root bus: see commit 8659c406ade3 ("x86: only scan the root bus in early PCI quirks").
Finally, secondary busses reachable from the first bus were re-added back by: commit 850c321027c2 ("x86/quirks: Reintroduce scanning of secondary buses").

[1] https://codywu2010.wordpress.com/2015/11/29/how-modern-multi-processor-multi-root-complex-system-assigns-pci-bus-number/

[2] PCI Firmware Specification and the ACPI spec.