I have server with dual Xeon Scalable 6148 CPUs running HPC application.
- Base clock: 2.4GHz
- All core Turbo: 3.1 GHz
Some processing threads are not scaling well and are sensitive to cpu clock. I was playing little with setting affinity and disabling HT on cores running critical threads. But at some point I have noticed 10%+ performance difference between two sockets. After some testing I have found out that both sockets run at different clock speed under load.
Here is fragment of output from turbostat
:
Package Core CPU Avg_MHz Busy% Bzy_MHz TSC_MHz IRQ SMI C1 C1E C6 C1% C1E% C6% CPU%c1 CPU%c6 CoreTmp PkgTmp Pkg%pc2 PkgWatt RAMWatt PKG_% RAM_%
- - - 1468 51.37 2864 1596 233634 0 1186 5607 24389 0.04 0.80 47.53 48.63 0.00 73 73 0.00 299.11 102.75 0.00 0.00
0 0 0 2738 99.46 2759 1596 5059 0 0 0 0 0.00 0.00 0.00 0.54 0.00 68 70 0.00 149.54 55.11 0.00 0.00
0 0 40 2738 99.46 2759 1596 5059 0 0 0 0 0.00 0.00 0.00 0.54
0 1 1 2738 99.48 2759 1596 5057 0 0 0 0 0.00 0.00 0.00 0.52 0.00 67
0 1 41 90 3.27 2755 1596 4889 0 153 776 4845 0.30 4.56 91.92 96.73
0 2 2 2738 99.46 2759 1596 5059 0 0 0 0 0.00 0.00 0.00 0.54 0.00 67
0 2 42 63 2.30 2739 1596 221 0 3 97 149 0.01 1.29 96.38 97.70
0 3 3 2737 99.45 2759 1596 5059 0 0 0 0 0.00 0.00 0.00 0.55 0.00 69
1 0 20 2954 99.54 2975 1596 5060 0 0 0 0 0.00 0.00 0.00 0.46 0.00 69 73 0.00 149.57 47.64 0.00 0.00
1 0 60 14 0.49 2972 1596 705 0 2 120 745 0.00 1.00 98.51 99.51
1 1 21 2953 99.53 2975 1596 5059 0 0 0 0 0.00 0.00 0.00 0.47 0.00 70
1 1 61 13 0.45 2981 1596 535 0 6 25 539 0.03 0.38 99.14 99.55
1 2 22 2954 99.55 2975 1596 5059 0 0 0 0 0.00 0.00 0.00 0.45 0.00 72
1 2 62 11 0.36 2978 1596 572 0 1 46 616 0.00 0.60 99.03 99.64
Difference is 200-300 MHz.
Initially I have suspected uneven load from app, but what is show above is just dummy load using
multiple instances of yes > /dev/null &
Thermals seems to be fine for both CPU.
What can be a reason of such difference in speed in what seems to be balanced load?