Private cloud GPU virtualization similar to Amazon Web Services Cluster GPU instances

Question

I am searching for options that enable dynamic cloud-based NVIDIA GPU virtualization similar to the way AWS assigns GPUs for Cluster GPU Instances.

My project is working on standing up an internal cloud. One requirement is the ability to allocate GPUs to virtual-machines/instances for server-side CUDA processing.

USC appears to be working on OpenStack enhancements to support this but it isn't ready yet. This would be exactly what I am looking for if it were fully functional in OpenStack.

NVIDIA VGX seems to only support allocation of GPUs to USMs, which is strictly remote-desktop GPU virtualization. If I am wrong, and VGX does enable server-side CUDA computing from virtual-machines/instances then please let me know.

It's possible to assign GPUs to VMs using the [Xen HVM hypervisor](http://wiki.xen.org/wiki/XenVGAPassthrough). It's a non-trivial set up, however, and in all probability there are many rough edges to the operation of it. The assignment has to be done before the VM is booted. And it is in effect a 1:1 mapping of GPUs to VMs, you cannot share a single GPU between multiple VMs simultaneously this way (using PCI Passthrough). — Robert Crovella, Jan 24 '13 at 18:40
@Robert Crovella - Thanks. I'd really like to find something that would integrate more seamlessly with a cloud management tool, but having a possible option is at least a start. I'll have to investigate if the full CUDA API is available. — Bob B, Jan 24 '13 at 22:25
@Robert Crovella is spot on, but if you're going to try it with Xen then there are a few prerequisites: CPU must have Intel VT/d or AMD IOMMU (not likely a problem nowadays), a GPU "enabled" for VT-d/IOMMU pass-through support (NVIDIA seem to call this Multi-OS) - this pretty much means M series Teslas and Quadros, Xen 4.1(and up, maybe?) — Blairo, Feb 14 '13 at 11:14
It could be a while, but I may look in to this. As of right now the plan is to test the new Open Stack beta w/ the GPU virtualization blue print from USC included in it. This wasn't out yet when I originally asked. It is scheduled to be included in the general release in April. I will report my findings. — Bob B, Feb 14 '13 at 14:24
you might consider taking advantage of openstack physical layer provisioning if this is an HPC use case. — Matt Joyce, Mar 16 '13 at 21:46
Has anyone tried this with SLI? I'd like to SLI two cards together and assign them to one VM. — John Thompson, Aug 21 '13 at 02:48

score 4 · Accepted Answer · answered May 07 '13 at 00:48

4

"dynamic cloud-based NVIDIA GPU virtualization similar to the way AWS assigns GPUs for Cluster GPU Instances."

AWS does not really allocate GPUs dynamically: Each GPU Cluster Compute has 2 fixed GPUs. All other servers (including the regular Cluster Compute) don't have any GPUs. I.e. they don't have an API where you can say "GPU or not", it's fixed to the box type, which uses fixed hardware.

The pass-thru mode on Xen was made specifically for your use case: Passing hardware on thru from the Host to the Guest. It's not 'dynamic' by default, but you could write some code that chooses one of the guests to get each card on the host.

answered May 07 '13 at 00:48

BraveNewCurrency

12,654
2
42
50

A Cluster GPU instance is still a VM running on top of the Xen hypervisor though, right? So when one instance stops, you can reassign GPUs it was using to a new instance, right? You can't do the assignment when either VM is on, but that's fine -- you are still dynamically allocating GPU resources to VM instances. Am I correct? – John Thompson Aug 28 '13 at 19:41
Yes it's running under Xen. But no, you don't assign them: AWS does. When you ask for a cg1.4xlarge, you get a box on a different rack because they have GPUs and other boxes don't. Most likely, they statically map the GPUs to the instances, since there must be 2 GPUs for each instance. – BraveNewCurrency Sep 22 '13 at 00:05

score 0 · Answer 2 · answered Sep 30 '14 at 15:05

There is a solution called GPUBox that virtualizes the devices within CUDA. It can be used either on Amazon or your own infrastructure.

Quote from the website (http://renegatt.com/solutions.php):

The GPUBox software simplifies GPU management by separating the application and operating systems from the underlying GPU devices. It is a solution that allows the dynamic sharing of GPU devices from the same pool, by many users. (...)GPUBox enables on-demand provisioning of GPU devices to a physical or virtual machine with a Linux or Windows operating system. The pool of GPU devices is shared among users which leads to reduction in the total power consumption and idle-running hardware.

Private cloud GPU virtualization similar to Amazon Web Services Cluster GPU instances

2 Answers2