0

I have server with dual Xeon Scalable 6148 CPUs running HPC application.

  • Base clock: 2.4GHz
  • All core Turbo: 3.1 GHz

Some processing threads are not scaling well and are sensitive to cpu clock. I was playing little with setting affinity and disabling HT on cores running critical threads. But at some point I have noticed 10%+ performance difference between two sockets. After some testing I have found out that both sockets run at different clock speed under load. Here is fragment of output from turbostat:

Package Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ     SMI     C1      C1E     C6      C1%     C1E%    C6%     CPU%c1  CPU%c6  CoreTmp PkgTmp  Pkg%pc2 PkgWatt RAMWatt PKG_%   RAM_%
-       -       -       1468    51.37   2864    1596    233634  0       1186    5607    24389   0.04    0.80    47.53   48.63   0.00    73      73      0.00    299.11  102.75  0.00    0.00
0       0       0       2738    99.46   2759    1596    5059    0       0       0       0       0.00    0.00    0.00    0.54    0.00    68      70      0.00    149.54  55.11   0.00    0.00
0       0       40      2738    99.46   2759    1596    5059    0       0       0       0       0.00    0.00    0.00    0.54
0       1       1       2738    99.48   2759    1596    5057    0       0       0       0       0.00    0.00    0.00    0.52    0.00    67
0       1       41      90      3.27    2755    1596    4889    0       153     776     4845    0.30    4.56    91.92   96.73
0       2       2       2738    99.46   2759    1596    5059    0       0       0       0       0.00    0.00    0.00    0.54    0.00    67
0       2       42      63      2.30    2739    1596    221     0       3       97      149     0.01    1.29    96.38   97.70
0       3       3       2737    99.45   2759    1596    5059    0       0       0       0       0.00    0.00    0.00    0.55    0.00    69

1       0       20      2954    99.54   2975    1596    5060    0       0       0       0       0.00    0.00    0.00    0.46    0.00    69      73      0.00    149.57  47.64   0.00    0.00
1       0       60      14      0.49    2972    1596    705     0       2       120     745     0.00    1.00    98.51   99.51
1       1       21      2953    99.53   2975    1596    5059    0       0       0       0       0.00    0.00    0.00    0.47    0.00    70
1       1       61      13      0.45    2981    1596    535     0       6       25      539     0.03    0.38    99.14   99.55
1       2       22      2954    99.55   2975    1596    5059    0       0       0       0       0.00    0.00    0.00    0.45    0.00    72
1       2       62      11      0.36    2978    1596    572     0       1       46      616     0.00    0.60    99.03   99.64

Difference is 200-300 MHz. Initially I have suspected uneven load from app, but what is show above is just dummy load using multiple instances of yes > /dev/null & Thermals seems to be fine for both CPU. What can be a reason of such difference in speed in what seems to be balanced load?

terion
  • 1
  • That turbostat shows 13 cores, what about the other 27? Distribution of load matters. Less threads within the same thermal budget means a higher boost clock. – John Mahowald May 28 '20 at 18:07
  • @JohnMahowald Other cores shows exactly the same pattern. I have cut turbostat log for brevity. Load is only multiple instances of `yes > /dev/null &` (checked for 40 and 80). This should balance perfectly across sockets (and generally is) – terion May 28 '20 at 21:42
  • The workload was the same on both physical CPUs? Do you know if the workload was AVX-512 compatible? – Vinícius Ferrão Jul 20 '21 at 00:34

0 Answers0