NVidia (Mellanox), the main proponent of Infiniband, has good support for Infiniband virtualization. With NVidia ConnectX cards and SR-IOV support, this works with KVM under Redhat, Centos, Debian and other Linux versions, both x86_64 and ARM. With SR-IOV, you can install the OFED and the VMs and SR-IOV will provide an interface that looks like a hardware network card to each VM. KVM has a number of network/bridging options, NAT, Routed, Isolated, that allow you to build an arbitrary number of virtual networks with and without physical or virtual function adapters.
Make sure that your Infiniband cards have support for Single Root IO Virtualization, SR-IOV. SR-IOV appears to present a unique physical network card, GPU or other IO device to each VM. SR-IOV is not supported on all Infiniband cards.
SR-IOV background:
NVidia Single Root IO Virtualization
https://www.lynx.com/embedded-systems-learning-center/what-is-sr-iov-and-why-is-it-important-for-embedded-devices
https://en.wikipedia.org/wiki/Single-root_input/output_virtualization
I will assume you are using an Intel CPU, NVidia ConnectX cards, the NVidia OFED and Redhat 7.9 or later or Centos 7.9 or later. This is possible on other versions of Linux. I am familiar with Centos and Redhat.
- Your Computer BIOS has to support CPU, IOMMU and SR-IOV virtualization
- Turn on CPU Virtualization in the BIOS, on Intel processors this is also called VT-x
- Turn on IOMMU, Input Output Memory Management Unit in the BIOS, on Intel processors this is also called VT-d
- Turn on SR-IOV support in the BIOS
- The Linux kernel needs to have IOMMU turned on. For Intel processors, you pass "intel_iommu=on" to the kernel. I always add IOMMU passthrough "iommu=pt" If you are using Redhat, edit /etc/default/grub and add this to the GRUB_COMMAND_LINE.
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/installation_guide/appe-configuring_a_hypervisor_host_for_pci_passthrough
- "sudo dracut --force"
- "sudo grub2-mkconfig -o /boot/..path_to_your_grub_cfg". For non-UEFI, this is "grub2-mkconfig -o /boot/grub2/grub.cfg"
- If this is successful, you should see IOMMU and DMAR (DMA Remapping) messages during boot by looking at the output of "dmesg"
- From NVidia, download and install a recently supported NVidia OFED matching your Infiniband cards, OS and processor and get it working on the host.
- From NVidia, download and install the Mellanox Firmware Tool, mft.
- From NVidia, download and install the latest firmware that is supported by your combination OFED, OS, processor and NVidia ConnectX card.
- Use NVidia MST and Flint to update the firmware on the ConnectX card.
https://network.nvidia.com/support/firmware/update-instructions/
- You have to use mlxconfig to enable SR-IOV in the OFED, and to set the number of Virtual Functions, the number of virtual ConnectX cards you are going to support. "mlxconfig -d /dev/mst/YOUR_INTERFACE_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=4"
https://support.mellanox.com/s/article/HowTo-Configure-SR-IOV-for-ConnectX-4-ConnectX-5-ConnectX-6-with-KVM-Ethernet
https://mymellanox.force.com/mellanoxcommunity/s/article/howto-configure-sr-iov-for-connect-ib-connectx-4-with-kvm--infiniband-x
https://shawnliu.me/post/configuring-sr-iov-for-mellanox-adapters/
- "sudo lspci | grep Mellanox" should show both hardware and virtual function adapters.
- You need to unbind one or more of the virtual functions from the host and bind them to the desired virtual machine
https://support.mellanox.com/s/article/howto-configure-sr-iov-for-connect-ib-connectx-4-with-kvm--infiniband-x
- KVM virt-manager or virsh will then allow you to add the virtual function as a PCI device in the virtual machine.
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/virtualization/sect-virtualization-adding_a_pci_device_to_a_host
- Install the OFED in the virtual machine