3

We have a PowerEdge R7525 server with nvidia A16 graphics card on debian 11. But we have about 50% lower gpu performance than other servers. I suspect it's the missing "Above 4G decoding" option in the BIOS. According to nvidia this server should handle up to 3 A16 gpu units. Can anyone advice me some work-around or something to harness the full power of this gpu?

Thank you very much in advance

Aotor
  • 31
  • 1

1 Answers1

6

(I work for Dell) - specifically, I do a lot of optimization.

I think you're tracking a bit off course; "Above 4G decoding" is a feature left over from when BIOS PCIe memory enumeration was limited to 32bits, which is no longer the case and hasn't been for quite some time. The addressing is now native 64 bit.

But we have about 50% lower gpu performance than other servers.

I'm not sure what you mean by this. I may be reading too much into this, but this statement makes me think this may be your first foray into optimization in which case, awesome! It's a complicated but fascinating world. GPU performance can be measured in myriad different ways so this statement on its own doesn't narrow down what the problem is.

With regards to why you're seeing poor performance, this is an enormously complex question on which people write entire books. Some common mistakes I see people make particularly on AMD-based servers:

  • Failing to account for PCIe lane / proc alignment. Make sure whatever processes you're running against the GPU are assigned to the proc that has the GPU's PCIe lanes rather than the distant proc
  • Failing to set NUMA's per core appropriately for the workload (this is unique to AMD systems like the R7525)
  • Failing to account for bottlenecks elsewhere. For example: I've had people see poor GPU performance but in reality part of their software was storage IO bound.
  • Maybe this is obvious, but try setting the BIOS profile to performance. If you set it to power saver that can lead to downclocks potentially when you don't want them
  • Poorly aligned memory transfers

Optimization is extremely workload specific. If this is the first time you've gone through it, I would focus my time on really understanding exactly how the data flows and where it might be bottlenecking. Try to identify things that seem out of place. Ex: if you think GPU performance is low, what is the GPUs utilization? Is it at 100%? If it is close to 100%, I start to lean towards software problems. If it's not at 100%, why is it not? Are you not feeding it data fast enough? Is the card underpowered? Server overheating? Etc.

Grant Curell
  • 1,043
  • 6
  • 19
  • Hello, first of all, I want to thank you for your time. The 50% lower gpu performance is meant when transcoding. We have several other servers with very similar configuration but on supermicro hardware (especially motherboard). same platform, same installation, same cpu, same gpu. But on this dell we can only transcode 20-24 channels on this gpu without errors. On other servers we have no problem even with 40 channels. – Aotor Aug 29 '23 at 11:03
  • The server and gpu are not overheated. The gpu ranges from 37 Celsius to 43 Celsius. Utilization is 97% - 100%, but on our other servers we have utilization around 65%-75% when runs 40 ffmpeg process. – Aotor Aug 29 '23 at 11:06
  • Can you please explain more this point? Failing to set NUMA's per core appropriately for the workload (this is unique to AMD systems like the R7525) – Aotor Aug 29 '23 at 11:09
  • Thank you very much in advance – Aotor Aug 29 '23 at 11:09
  • I don't know what processor you're running, but if you're marching into optimization territory on Rome/Milan/Genoa/newer you need to be very familiar with the NUMA topology. I would start here: https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/tuning-guides/redhat-enterprise-linux-tuning-guide-amd-epyc7003-series-processors.pdf – Grant Curell Aug 30 '23 at 23:32
  • See here for getting started understanding their numas per socket setting. This has massive performance implications - if you have the same proc on SuperMicro I expect you have tuned it there if it's working well for you. I would start with whatever you're using in the good setup. If you don't see numas per socket on the supermicro than you don't have the same CPU or SuperMicro is hiding major options from you. https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/white-papers/overview-amd-epyc7003-series-processors-microarchitecture.pdf#page=10 – Grant Curell Aug 30 '23 at 23:35
  • Thank you again for your time. We have in this server this cpu AMD EPYC 75F3. – Aotor Aug 31 '23 at 08:09
  • Thank for docs i'll go thru them and hope it helps. – Aotor Aug 31 '23 at 08:10