Many PCI devices (e.g. GPUs) are multifunction.
For instance, for an NVIDIA 3090:
02:00.0 VGA compatible controller: NVIDIA Corporation GA102 [GeForce RTX 3090] (rev a1)
02:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
However, OpenStack only supports passing through the root device (like 02:00.0). Separately, you can pass through the other sub-devices.
When passing through all devices separately on multi-GPU servers, we sometimes get this situation:
mysql> use nova;
mysql> select address, product_id, dev_id, status from pci_devices where deleted = 0;
+--------------+------------+------------------+-----------+
| address | product_id | dev_id | status |
+--------------+------------+------------------+-----------+
| 0000:01:00.0 | 24b0 | pci_0000_01_00_0 | allocated |
| 0000:01:00.1 | 228b | pci_0000_01_00_1 | available |
| 0000:45:00.0 | 24b0 | pci_0000_45_00_0 | available |
| 0000:45:00.1 | 228b | pci_0000_45_00_1 | allocated |
+--------------+------------+------------------+-----------+
14 rows in set (0.00 sec)
As you can see, in this machine, OpenStack has passed through the audio device of PCI 45:00, whereas it has passed through the GPU of PCI 01:00.
This configuration is problematic, as we need the audio devices and the GPUs themselves to be on the same physical PCI card.
Any thoughts or advice?
One option is to just use Libvirt and use the "multifunction=on" and skip OpenStack entirely, but I'm really curious as to if OpenStack has some similar functionality so that all slots / subfunctions are passed through into VMs form the same physical PCI device.
Thanks!