2

Motivation:

First of all, even if I have some knowledge of computer science, software development and server Linux administration, I never looked into a server hardware and I am a total "newbie" to it. Sorry if this question is trivial to most of you.

I am developing a software with quite intensive (single point) computing needs, to arrive to the required TFlops, I selected OpenCL (2.1) framework and perform most of the computation on a High-end consumer AMD graphic card, using the CPU mostly to drive the GPU (Linux OS). I am now looking to extend this through multiple machines.

Looking how to organize those machines, it became quickly evident that standard (consumer) ATX towers are not ideal: each brand bring it own chassis shape, they basically can't stack easily and conveniently in a 19" enclosure, with nice cooling air flow, shared APC, management of cables, etc..

With this goal, I started looking to get a rack cabinet with servers, and found that:

  • GPUs designed for HPC like Instinct/Tesla cost one order of magnitude more than consumer GPUs, mostly to bring double floating point which are "slow" on consumer devices (and because they can sell at that price to enterprises).
  • Even with those GPUs, a PCI-Express spacer is needed
  • GPU-ready servers allows only up to 2-slots graphic cards (current high-end consumer GPUs are usually 3-slots).
  • I found ATX 3U or 4U chassis designed for 19" cabinets. But hey! mounting one of those with consumer hardware would exclude ECC, multiple APC, etc.

The question:

What to consider in order to buy a server intended to host 1 or 2 consumer-grade GPUs?

I spent lot of time already looking on internet, and could not get a basic understanding on the question, for example, following are some ideas that come to my mind:

  • Is it good idea, or even possible at all? Several texts (web pages) complain about the difficulty to make those systems to work together, incompatibilities, driver issues, etc.
  • Any 2U-3U server chassis can hold a 3-slot graphic card? or two?
  • Some servers (e.g. Gigabyte Gxxx) are especially designed for HPC with GPUs, does this really bring any difference compared to standard (for instance) HPE Proliant, IBM.. servers?
  • Does most servers support PCI-express v4 x16 required for consumer-grade GPU cards?
  • Is the air flow in the server enclosure compatible with a consumer-grade GPU card (usually 3 vents on the bottom)
  • Any problem with power connections?
Adrian Maire
  • 145
  • 1
  • 10
  • The first thing to consider is that Nvidia doesn't allow the use of consumer grade GPUs in datacenter servers. – Gerald Schneider Jan 08 '21 at 08:12
  • Is this a legal restriction? or are you speaking of Nvidia cards having some limitation? (I am sort of against Nvidia anyway (Monopoly strategies), but for completeness of the answer, is still important) – Adrian Maire Jan 08 '21 at 08:17
  • 2
    It's a legal restriction. Google for it, it went through the press pretty thoroughly when nVidia introduced the policy. – Gerald Schneider Jan 08 '21 at 08:20
  • 1
    The other thing to consider with rack-mounted servers is that airflow is specifically focussed on cooling the CPUs and memory, with far less/no attention to worrying about higher-heat-generating PCIe cards. Most PCIe cards in servers are less than 100w such as NICs/HBAs/RAID adapters etc. Obviously GPUs typically use several times as much, also the way they tend to be mounted in 1U/2U servers would severely restrict incoming airflow to the front-mounted GPU fan/s - also many consumer GPUs don't blow the heated air to the rear port guard but back into the 'case'. – Chopper3 Jan 08 '21 at 11:02

1 Answers1

1

We use lots of GPUs in our servers - but there's a single rule to follow;

Only use parts explicitly supported for your exact server model by the manufacturer.

Do not break this rule.

Chopper3
  • 101,299
  • 9
  • 108
  • 239
  • Wow!, that is quite limiting the number of options (e.g. some servers I see only enumerate few Quadro cards. Interesting. – Adrian Maire Jan 08 '21 at 09:47
  • I seen no consumer-card supported so far. Did you use any consumer-grade card? – Adrian Maire Jan 08 '21 at 10:27
  • 1
    Yes, yes it is, but this site is for professional sysadmins and designers, one of our key concerns is to create supportable solutions. And no we use server-grade GPUs, never had any performance or reliability issues with these. – Chopper3 Jan 08 '21 at 10:29
  • I assume this as a "Don't do that!" answer. Leaving me with two options: buy server-grade GPUs (not an option as I can't afford >50K$ server), or make a full consumer-grade server in an ATX-compatible chassis. I will implement some ECC-like by software. – Adrian Maire Jan 08 '21 at 11:19
  • It's the 'putting the consumer GPUs inside the servers' bit that's the problem - one option around it would be to find an external PCIe enclosure, they usually connect back to the server via Thunderbolt, so you'd need an adapter for that inside the server, then you can fill out the enclosure however that's allowed to be filled. This is often referred to as an eGPU - i.e. external, our dev guys do that with Macs and it works fine, obviously Thunderbolt doesn't support (I don't think) dual-connections so you'd want to make sure you used a good cable and was careful with it but that's a way out. – Chopper3 Jan 08 '21 at 11:23
  • 1
    I'm sure Supermicro has something that will fit consumer GPUs, though their official line is "For consumer-grade GPU card support, please contact Supermicro sales and technical support for details." – Michael Hampton Jan 08 '21 at 18:39