0

I'm using several PCIe 3.0 extension cards (GPUs and Infiniband interconnects). I'm wondering how lanes are actually managed and if I may optimize my devices by changing ports or by using some adapters (16x -> 8x). Intel Haswell-EP may manage 40 lanes PCIe 3.0. On Intel's schematics, the PCIe 3.0 controller seems to be split in two x16 and one x8 sub-bridges.

On some commercial schematics for the Haswell-EP CPU, we might read:

Up to 40 PCIe Gen3 Lanes 2x16 + 1x8 up to 3x8 Graphics.

Are all devices connected to a main PCIe bridge (and quantity of lanes automatically negotiated for each device), or do the motherboard connect the devices directly to one of the supposedly 3 sub-bridges 16x, 16x and 8x (quantity of lane are then negotiated for each of those sub-bridges)?

I do not have a direct access to the motherboard to see how devices are connected, but I suspect that the lanes of the supposedly 8x sub-bridge are not utilized. Also, I would like to know if by using a 16x to 8x adapter, I could harness more lanes and increase my total PCIe bandwidth (even tough the maximum theoretical bandwidth would be divided by two for that device).

[edit]

Example of what I obtain for one CPU socket with lstopo:

HostBridge L#0
  PCIBridge
    PCI 15b3:1011
      Net L#16 "ib0"
      OpenFabrics L#17 "mlx5_0"
  PCIBridge
    PCI 8086:1d6b
  PCIBridge
    PCI 102b:0532
      GPU L#18 "card0"
      GPU L#19 "controlD64"
talonmies
  • 70,661
  • 34
  • 192
  • 269
jyvet
  • 2,021
  • 15
  • 22
  • perhaps this belongs in https://electronics.stackexchange.com/ ? SO is software-oriented. – WeaponsGrade Mar 20 '16 at 01:12
  • this is for systems programming/performance. But you are right, this is more a hardware-orientend question. I thougth some programmers with some experience in GPU/infiniband programming might have the answer. Anyway, I'll try on electronics.stackexchange.com after the expiration of the bounty attribution time. – jyvet Mar 20 '16 at 13:32

1 Answers1

4

Are all devices connected to a main PCIe bridge (and quantity of lanes automatically negotiated for each device), or do the motherboard connect the devices directly to one of the supposedly 3 sub-bridges 16x, 16x and 8x (quantity of lane are then negotiated for each of those sub-bridges)?

This is a function of motherboard design, at least in part, so a specific answer cannot be given. But assuming your motherboard has no additional PCIE hardware such as PCIE switches, then it's likely that your motherboard has at least 1 PCIE x16 "port" and some number of other "ports" i.e. slots, which may have varying "widths", i.e. x16, x8, x4, x2, x1, etc.

A modern Intel CPU has an internal PCIE "root complex" which is shared by all the lanes leaving the device. The lanes leaving the device will be grouped into one or more "ports". The PCIE root complex is a logical entity, whereas the ports have both a logical and physical character to them.

There is automatic lane width negotiation, but this is usually only in place as a support and error mitigation strategy. A x16 port will expect to negotiate to x16 width if a x16 "endpoint" (i.e. device) is plugged into it (it may also negotiate to a lower width if errors are detected that are localizable to particular lanes). Usually a port can handle a device of lesser width, so if a x8 device is plugged into a x16 port, things will usually "just work", although this does not usually mean that you have 8 additional lanes you can use "somewhere else".

Reconfiguration of a x16 port to two x8 ports is not something that would normally automatically occur by plugging in a "x16 to x8 adapter", whatever that is. You could certainly reduce a x16 port to a x8 port, but that does not give you 8 extra lanes to use elsewhere automatically.

The process of subdivision of the 40 lanes exiting your Haswell device into logical "ports" involves both hardware design of the motherboard as well as firmware (BIOS) design. A x16 port cannot automatically be split into two (logical) x8 ports. Some motherboards have such configuration options and they are usually selected by some explicit means such as BIOS configuration or modification of a switch or routing PCB, along with the provision of two slots, one for each of the possible ports.

What is fairly common, however, is the use of PCIE switches. Such switches allow a single PCIE (upstream) port to service two (or more) downstream ports. This does not necessarily have to imply conversion of x16 logical character to x8 logical character (although it might, depending on implementation), but it will usually imply whatever bandwidth limit is in place for the upstream port is applied in aggregate to the downstream ports. Nevertheless, this is a fairly common product strategy, and you can find examples of motherboards which have these devices designed into them (to effectively provide more slots, or ports) as well as adapters/planars, which can be plugged into an existing port (i.e. slot) and will provide multiple ports/slots from that single port/slot.

In the linux space, the lstopo command is useful for discovering these topologies. You may need to install the hwloc package in your linux distro.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • On the Haswell do you know if there is a single upstream port or three different ones. In other words, could the 40 lanes be divided for instance in 2 logical ports (32x for very uncommon devices and one 8x) if the motherboard is designed to do so or the ports have to be 16x + 16x + 8x (with then lane width negotiation inside those subsets). I'd like to understand if some PCIe switches could harness all 40 lanes at the same time for many devices, or if the maximum amount of lanes (upstream) connected to the same switch is 16x due to the design of the Haswell PCIe controller. – jyvet Mar 22 '16 at 10:18
  • I used lstopo before asking my question. Since I saw 3 PCIBridges at the 1st level subparts for the same HostBridge (see the updated post), I thought it was because of a 16x + 16x + 8x subdivision constraint. Actually, on other machines I can see more than 3 1st level subparts. I guess we may not deduce from lstopo the maximum amount of lanes connected to each PCIBridge, for intance to be able to compute the maximum bandwidth for all devices attached. – jyvet Mar 22 '16 at 10:38
  • 2
    If you're talking about designing your own motherboard, and BIOS, you can probably create a x32 port I would guess (although possibly intel simply doesn't support that). But it's very uncommon. Once a motherboard is designed, however, it's generally not possible (excepting the manual config x16/x8 example I gave) of reassigning lanes to ports. You could not harness all 40 lanes for a single device. The number of lanes (i.e. the width) of a given port is discoverable from `lspci`, if you look at the full range of data (`-vvv`), both the hardware (i.e. design) width and the negotiated width – Robert Crovella Mar 22 '16 at 13:17
  • Thx for the reply. It helped a lot. With `lspci -vvv` I got those info: PCI bridge: Intel Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 1a: **LnkCap: Port #0, Speed 8GT/s, Width x8**, PCI bridge: Intel Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 2a: **LnkCap: Port #0, Speed 8GT/s, Width x16**, PCI bridge: Intel Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 3a: **LnkCap: Port 07, Speed 8GT/s, Width x8** – jyvet Mar 22 '16 at 13:46
  • 2
    **LnkCap** gives you the hardware capabilities. **LnkSta** gives you the negotiated capabilities. I think you've already figured out that the `lspci` output will be different if you are root. You need to be root to get the full output. – Robert Crovella Mar 22 '16 at 13:54
  • @RobertCrovella, "you can probably create a x32 port I would guess" - you can't electrically merge two physical x16 ports into x32 port as there is no x32-wide [data striping](http://m.eet.com/media/1076316/0906esdQuinell03.gif) and no x32-wide [deskew (Elastic Buffers)](http://patentimages.storage.googleapis.com/US20140056370A1/US20140056370A1-20140227-D00002.png) in the CPU (check RX- and 'TXLANE_WIDTH_SUPPORTED' Bit 6 - https://pcisig.com/sites/default/files/specification_documents/ECN_M-PCIe_22_May_2013.pdf "1: x32 TX-LANE is supported' "1: x32 RX-LANE is supported') – osgx May 07 '17 at 20:12
  • @osgx I didn't say anything about electrically merging anything. Not sure what is the point of your comment. – Robert Crovella May 08 '17 at 22:01
  • Robert, there should be hardware capable of 32x ports to create 32x port. If xeon has 16x ports, pair of they can't be combined into one 32x port electrically or by BIOS patch. Only variant to create 32x port is to use switch, and such switch will have no 16x + 16x upstream to single root complex. – osgx May 08 '17 at 22:13