AMD Epyc, occassional crappy performance

Question

I'm working on a 2x AMD EPYC 7451 server, with scaling governor set to performance, yet there are issues when the server is under low load. When the load is low all the cores are basically downscaled to 600-1000MHz and the response times skyrocket to 2-3x normal values, while CPU load isn't even reported that low because all cores are running using these crappy clocks...

So what I think should happen when load is low, is that the kernel process scheduler could just put all running threads on one NUMA node and scaling governor max these couple CPU cores, so like 24 cores would run at 2.8GHz, and the rest can be at 600MHz idling. Or it could ad least keep all the cores at normal speeds.

What is actually happening is that each core is set to 600-1000 and then the kernel seem to round-robin threads on all of these slow 96 cores, which is actually even kind of funny, because it would be hard to come up with better way of wasting energy, creating unneded load on infinity fabric and killing performance at the same time. And even serving the requests 3 TIMES slower as when the server has high load.

I don't want to disregard AMD SPUs because it looks like kernel issue. For Intel it also works like this, but ONLY when power-saving governors are used. Switching it to performance solves it which is kind-of logical I guess. I'm not sure why this server, when set to performance mode, is managed by kernel like a cheap laptop on battery saving? Any ideas? Using Fedora 27 ATM...

Low Load:
cat /proc/cpuinfo | grep MHz
cpu MHz         : 685.117
cpu MHz         : 685.877
cpu MHz         : 656.451
cpu MHz         : 651.857
cpu MHz         : 622.491
cpu MHz         : 677.199
cpu MHz         : 702.872
cpu MHz         : 677.941
...

High Load:
cat /proc/cpuinfo | grep MHz
cpu MHz         : 2848.291
cpu MHz         : 2896.881
cpu MHz         : 2893.726
cpu MHz         : 2895.113
cpu MHz         : 2467.476
cpu MHz         : 2498.073
cpu MHz         : 2492.711
cpu MHz         : 2488.875
cpu MHz         : 2496.855
cpu MHz         : 2485.083
...

score 2 · Answer 1 · answered Feb 12 '19 at 13:17

2

Remove the power saving for the cpu/C4 setting for max performance in the BIOS, it will force the CPU to stay at the max settings.

I suspect a mainboard problem.

answered Feb 12 '19 at 13:17

yagmoth555

16,758
4
29
50

Thanks for suggestion. Will it still allow cores to be boosted? – Slawek Feb 12 '19 at 13:31
@Slawek I don't know, but surely, as it just remove the powersaving feature, as such the core will always be rated at their max value – yagmoth555 Feb 12 '19 at 14:51

score 1 · Answer 2 · answered Feb 13 '19 at 10:45

Sorry it's silly to answer my own question but after trying to find solution everything was updated to newest versions (fedora 29). And then all schedulers are working as expected. So powersave/ondemand keeps lower cpu speeds and performance no longer drops to 600MHz but keeps everything above 2.8GHz all the time even when server is unused, which is expected i think...

It just seems that FC27 is too old for this CPU...

AMD Epyc, occassional crappy performance

2 Answers2