1

I am trying to set up 5 GPUs on X10SRH-CF to run Tensorflow, but cannot seem to get the board boot up with more than 3 cards plugged in. In some layout I get it to boot with 4 cards but OS (Ubuntu server 16.04) sees only 3 cards plugged in. If I move a single card among all slots it works fine, so individual slots don't seemt to be a problem.

As far as I see it might be a problem with CPU/PCH PCIe usage but I am not very experienced with these types of motherboards.

Only peripherals plugged in are the GPUs and 2 SSDs that are not in RAID.

Inxsible
  • 341
  • 1
  • 4
  • 13
Marin
  • 193
  • 1
  • 5
  • From the looks of the question, this has nothing to do with IPMI. Please consider removing unnecessary tags. – Inxsible Sep 06 '17 at 14:54

1 Answers1

0

Supermicro X10SRH-CF supports the following PCIE configurations:

  • 1x PCI-E 3.0 x4 (in x8) slot
  • 1x PCI-E 3.0 x8 (in x16) slot
  • 2x PCI-E 3.0 x8 slots
  • 1x PCI-E 2.0 x2 (in x4) slot
  • 1x PCI-E 2.0 x4 (in x8) slot

So first thing is to check is how many PCIE lanes do your cards require? Are they the same model of GPU cards? As you can see, the size of the slots DOES NOT directly correspond to the number of lanes available to that slot. Eg. you have only 8 lanes available in the size 16 slot, so if you put in a card that requires 16 lanes, it will only operate at half speed which may or may not be a problem for specific types of GPU cards. So you'd have to make sure you have enough PCIE lanes available to support all your cards

EDIT: Also make sure you have enough power supply available in order to power your board, CPU, other peripherals and 5 of your GTX 1080 cards. Looks like each card claims to take 180W. Safe bet would be to provide at least 1.5x power. That would need

1.5x180W x 5 = 1350W

And that is only for the 5 GPU cards. Make sure you have additional supply available for your board, CPU, HDDs and other peripherals. Also remember, that on boot the power required is a bit on the high side until the system gets going. So make sure you also add in a bit of tolerance for every component requiring power.

Inxsible
  • 341
  • 1
  • 4
  • 13
  • They are all the same (GTX 1080) and should work fine on x2 - I tried moving a single card on all slots and it worked. If I add 4 of them - problems appear. I am not sure though if the mb does some multiplexing or whatnot. – Marin Sep 07 '17 at 14:06
  • If all your cards are x2, then your board should have enough PCIE lanes to work with. Unless, as you mentioned, multiplexing. Your motherboard manual might throw some light on that issue. I am taking a wild wild guess here, but how about trying with only 1 SSD. Can you then utilize 4 cards? It would be very unlikely that the PCIE lanes are shared for SATA ports, but still. – Inxsible Sep 07 '17 at 14:27
  • Other things to check would be : What type of RAM are you using? RDIMMs or LRDIMMs? ECC or Non-ECC? What processor and what is the BIOS version? Do you have enough RAM? – Inxsible Sep 07 '17 at 16:01
  • There is 32GB (2x16GB) ECC memory so I believe it should suffice. Card seems to work fine in x2 - I tried switching one card from slot to slot and it works in all of them. When I try the same with 3 cards (or more), depending on which slot(s) I use, i get different number of cards working or I even get the boot freeze on "scanning PCI bus, code 91" – Marin Sep 18 '17 at 13:59
  • Do you have enough PSU to support 4+ cards in addition to whatever power your board, CPU and other peripherals require? What size PSU are you using ? Looks like each card will require 180W of power. Safe bet would be to give at least 1.5x to each card. Now you need to figure out if you have enough power supply available. – Inxsible Sep 18 '17 at 15:12
  • I have edited my answer above to include checking on the PSU and a very rough estimate of the amount you will need. – Inxsible Sep 18 '17 at 15:20