3

this question is a copy from stackoverflow, as I was told, my question would fit here better.

Despite quite a long search for days, I could not find any good information about the following - though I think, I can not be the first to have this problem: We are working on a high-performance-cluster with MATLAB, MPI and Infiniband. This setting has been working quite well over the last years. But to reach more flexibility and easier maintaining we are thinking about virtualizing the calculation nodes with KVM.

Now I have the big problem of getting Infiniband "into" my virtual machine. I do not only want to passthrough the PCI-Interface but build something corresponding to a ethernet bridge which I can connect to my machine(s) on my host. I found some sources that talk about this - but not how to install/configure IB. Does anyone out there have an idea how to this?

Thanks in advance!

Daniel
  • 31
  • 1
  • Are you running IPoIB? Why do you want to virtualize at all? – Michael Hampton Oct 20 '14 at 11:56
  • Yes, we're running IPoIB. The reasons, why we virtualize are manifold: Different OS-Setups on the same hardware, different versions for softwares, sometimes even overcommiting of CPUs is necessary (despite the loss of performance). – Daniel Oct 20 '14 at 13:52
  • 2
    I know for certain IPoIB setups with KVM work quite well, Mellanox have showcased quite a few use cases. IIRC, most of those were simple L2 bridges over an IB interface, with virtio_net in the guest. – dyasny Oct 20 '14 at 14:53

1 Answers1

0

NVidia (Mellanox), the main proponent of Infiniband, has good support for Infiniband virtualization. With NVidia ConnectX cards and SR-IOV support, this works with KVM under Redhat, Centos, Debian and other Linux versions, both x86_64 and ARM. With SR-IOV, you can install the OFED and the VMs and SR-IOV will provide an interface that looks like a hardware network card to each VM. KVM has a number of network/bridging options, NAT, Routed, Isolated, that allow you to build an arbitrary number of virtual networks with and without physical or virtual function adapters.

Make sure that your Infiniband cards have support for Single Root IO Virtualization, SR-IOV. SR-IOV appears to present a unique physical network card, GPU or other IO device to each VM. SR-IOV is not supported on all Infiniband cards.

SR-IOV background:

NVidia Single Root IO Virtualization

https://www.lynx.com/embedded-systems-learning-center/what-is-sr-iov-and-why-is-it-important-for-embedded-devices

https://en.wikipedia.org/wiki/Single-root_input/output_virtualization

I will assume you are using an Intel CPU, NVidia ConnectX cards, the NVidia OFED and Redhat 7.9 or later or Centos 7.9 or later. This is possible on other versions of Linux. I am familiar with Centos and Redhat.

  • Your Computer BIOS has to support CPU, IOMMU and SR-IOV virtualization
  • Turn on CPU Virtualization in the BIOS, on Intel processors this is also called VT-x
  • Turn on IOMMU, Input Output Memory Management Unit in the BIOS, on Intel processors this is also called VT-d
  • Turn on SR-IOV support in the BIOS
  • The Linux kernel needs to have IOMMU turned on. For Intel processors, you pass "intel_iommu=on" to the kernel. I always add IOMMU passthrough "iommu=pt" If you are using Redhat, edit /etc/default/grub and add this to the GRUB_COMMAND_LINE.

https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.1/html/installation_guide/appe-configuring_a_hypervisor_host_for_pci_passthrough

  • "sudo dracut --force"
  • "sudo grub2-mkconfig -o /boot/..path_to_your_grub_cfg". For non-UEFI, this is "grub2-mkconfig -o /boot/grub2/grub.cfg"
  • If this is successful, you should see IOMMU and DMAR (DMA Remapping) messages during boot by looking at the output of "dmesg"
  • From NVidia, download and install a recently supported NVidia OFED matching your Infiniband cards, OS and processor and get it working on the host.
  • From NVidia, download and install the Mellanox Firmware Tool, mft.
  • From NVidia, download and install the latest firmware that is supported by your combination OFED, OS, processor and NVidia ConnectX card.
  • Use NVidia MST and Flint to update the firmware on the ConnectX card.

https://network.nvidia.com/support/firmware/update-instructions/

  • You have to use mlxconfig to enable SR-IOV in the OFED, and to set the number of Virtual Functions, the number of virtual ConnectX cards you are going to support. "mlxconfig -d /dev/mst/YOUR_INTERFACE_pciconf0 set SRIOV_EN=1 NUM_OF_VFS=4"

https://support.mellanox.com/s/article/HowTo-Configure-SR-IOV-for-ConnectX-4-ConnectX-5-ConnectX-6-with-KVM-Ethernet

https://mymellanox.force.com/mellanoxcommunity/s/article/howto-configure-sr-iov-for-connect-ib-connectx-4-with-kvm--infiniband-x

https://shawnliu.me/post/configuring-sr-iov-for-mellanox-adapters/

  • "sudo lspci | grep Mellanox" should show both hardware and virtual function adapters.
  • You need to unbind one or more of the virtual functions from the host and bind them to the desired virtual machine

https://support.mellanox.com/s/article/howto-configure-sr-iov-for-connect-ib-connectx-4-with-kvm--infiniband-x

  • KVM virt-manager or virsh will then allow you to add the virtual function as a PCI device in the virtual machine.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/virtualization/sect-virtualization-adding_a_pci_device_to_a_host

  • Install the OFED in the virtual machine
edt11x
  • 101
  • 4