What can be a reason for different clock speeds between sockets on 2 x Xeon Scalable 6148?

Question

I have server with dual Xeon Scalable 6148 CPUs running HPC application.

Base clock: 2.4GHz
All core Turbo: 3.1 GHz

Some processing threads are not scaling well and are sensitive to cpu clock. I was playing little with setting affinity and disabling HT on cores running critical threads. But at some point I have noticed 10%+ performance difference between two sockets. After some testing I have found out that both sockets run at different clock speed under load. Here is fragment of output from turbostat:

Package Core    CPU     Avg_MHz Busy%   Bzy_MHz TSC_MHz IRQ     SMI     C1      C1E     C6      C1%     C1E%    C6%     CPU%c1  CPU%c6  CoreTmp PkgTmp  Pkg%pc2 PkgWatt RAMWatt PKG_%   RAM_%
-       -       -       1468    51.37   2864    1596    233634  0       1186    5607    24389   0.04    0.80    47.53   48.63   0.00    73      73      0.00    299.11  102.75  0.00    0.00
0       0       0       2738    99.46   2759    1596    5059    0       0       0       0       0.00    0.00    0.00    0.54    0.00    68      70      0.00    149.54  55.11   0.00    0.00
0       0       40      2738    99.46   2759    1596    5059    0       0       0       0       0.00    0.00    0.00    0.54
0       1       1       2738    99.48   2759    1596    5057    0       0       0       0       0.00    0.00    0.00    0.52    0.00    67
0       1       41      90      3.27    2755    1596    4889    0       153     776     4845    0.30    4.56    91.92   96.73
0       2       2       2738    99.46   2759    1596    5059    0       0       0       0       0.00    0.00    0.00    0.54    0.00    67
0       2       42      63      2.30    2739    1596    221     0       3       97      149     0.01    1.29    96.38   97.70
0       3       3       2737    99.45   2759    1596    5059    0       0       0       0       0.00    0.00    0.00    0.55    0.00    69

1       0       20      2954    99.54   2975    1596    5060    0       0       0       0       0.00    0.00    0.00    0.46    0.00    69      73      0.00    149.57  47.64   0.00    0.00
1       0       60      14      0.49    2972    1596    705     0       2       120     745     0.00    1.00    98.51   99.51
1       1       21      2953    99.53   2975    1596    5059    0       0       0       0       0.00    0.00    0.00    0.47    0.00    70
1       1       61      13      0.45    2981    1596    535     0       6       25      539     0.03    0.38    99.14   99.55
1       2       22      2954    99.55   2975    1596    5059    0       0       0       0       0.00    0.00    0.00    0.45    0.00    72
1       2       62      11      0.36    2978    1596    572     0       1       46      616     0.00    0.60    99.03   99.64

Difference is 200-300 MHz. Initially I have suspected uneven load from app, but what is show above is just dummy load using multiple instances of yes > /dev/null & Thermals seems to be fine for both CPU. What can be a reason of such difference in speed in what seems to be balanced load?

That turbostat shows 13 cores, what about the other 27? Distribution of load matters. Less threads within the same thermal budget means a higher boost clock. — John Mahowald, May 28 '20 at 18:07
@JohnMahowald Other cores shows exactly the same pattern. I have cut turbostat log for brevity. Load is only multiple instances of `yes > /dev/null &` (checked for 40 and 80). This should balance perfectly across sockets (and generally is) — terion, May 28 '20 at 21:42
The workload was the same on both physical CPUs? Do you know if the workload was AVX-512 compatible? — Vinícius Ferrão, Jul 20 '21 at 00:34

What can be a reason for different clock speeds between sockets on 2 x Xeon Scalable 6148?

0 Answers0