Motivation:
First of all, even if I have some knowledge of computer science, software development and server Linux administration, I never looked into a server hardware and I am a total "newbie" to it. Sorry if this question is trivial to most of you.
I am developing a software with quite intensive (single point) computing needs, to arrive to the required TFlops, I selected OpenCL (2.1) framework and perform most of the computation on a High-end consumer AMD graphic card, using the CPU mostly to drive the GPU (Linux OS). I am now looking to extend this through multiple machines.
Looking how to organize those machines, it became quickly evident that standard (consumer) ATX towers are not ideal: each brand bring it own chassis shape, they basically can't stack easily and conveniently in a 19" enclosure, with nice cooling air flow, shared APC, management of cables, etc..
With this goal, I started looking to get a rack cabinet with servers, and found that:
- GPUs designed for HPC like Instinct/Tesla cost one order of magnitude more than consumer GPUs, mostly to bring double floating point which are "slow" on consumer devices (and because they can sell at that price to enterprises).
- Even with those GPUs, a PCI-Express spacer is needed
- GPU-ready servers allows only up to 2-slots graphic cards (current high-end consumer GPUs are usually 3-slots).
- I found ATX 3U or 4U chassis designed for 19" cabinets. But hey! mounting one of those with consumer hardware would exclude ECC, multiple APC, etc.
The question:
What to consider in order to buy a server intended to host 1 or 2 consumer-grade GPUs?
I spent lot of time already looking on internet, and could not get a basic understanding on the question, for example, following are some ideas that come to my mind:
- Is it good idea, or even possible at all? Several texts (web pages) complain about the difficulty to make those systems to work together, incompatibilities, driver issues, etc.
- Any 2U-3U server chassis can hold a 3-slot graphic card? or two?
- Some servers (e.g. Gigabyte Gxxx) are especially designed for HPC with GPUs, does this really bring any difference compared to standard (for instance) HPE Proliant, IBM.. servers?
- Does most servers support PCI-express v4 x16 required for consumer-grade GPU cards?
- Is the air flow in the server enclosure compatible with a consumer-grade GPU card (usually 3 vents on the bottom)
- Any problem with power connections?