NOTE We have a couple of close votes for this post. I am not sure exactly why since no one has commented but this question is about how our custom written Java server and its calls into socket reads can interact with the Windows OS and produce this high system CPU usage, therefore I believe this is the correct forum to post this question rather than a sysadmin area.
I have a custom Java server that is serving TCP connections of a couple of different types (some web, some raw) and one of our customers is having issues with occasional periods of high CPU + high system CPU.
We may have reproduced this on our test server which is running on Windows Server 2008 on vmware.
During a failover from one server to another or during other network events that cause a number of clients (40-80) to re-establish their connections simultaneously we see high CPU usage including very high system CPU (>50% average with spikes of over 70%). These clients typically maintain TCP sockets to the server so during the failover we would see perhaps 80-160 sockets being reconnected and serviced.
We have used kernrate to profile during one of these periods and are having a hard time understanding the results.
Kernrate reports low idle time and high kernel time as we would expect during this period:
P0 K 0:00:39.405 (55.9%) U 0:00:27.424 (38.9%) I 0:00:03.603 ( 5.1%) DPC
0:00:01.528 ( 2.2%) Interrupt 0:00:00.327 ( 0.5%)
Interrupts= 114880, Interrupt Rate= 1631/sec.
P1 K 0:00:37.596 (53.4%) U 0:00:29.281 (41.6%) I 0:00:03.556 ( 5.0%) DPC
0:00:00.078 ( 0.1%) Interrupt 0:00:00.624 ( 0.9%)
Interrupts= 96016, Interrupt Rate= 1363/sec.
TOTAL K 0:01:17.002 (54.7%) U 0:00:56.706 (40.3%) I 0:00:07.160 ( 5.1%) DPC
0:00:01.606 ( 1.1%) Interrupt 0:00:00.951 ( 0.7%)
Total Interrupts= 210896, Total Interrupt Rate= 2994/sec.
Total Profile Time = 70434 msec
There is plenty of physical ram available to both the host and the guest and perfmon reports page faults of from zero to 200/sec with maybe a couple of spikes during the 70s period of 800/sec. The Java heap is set to 3GB which is doesn't come close to the edge of during the problem period (or at all). We don't believe it is doing garbage collection during this time.
BytesStart BytesStop BytesDiff.
Available Physical Memory , 6296776704, 6266650624, -30126080
Available Pagefile(s) , 12591370240, 12577845248, -13524992
Available Virtual , 8796052869120, 8796048171008, -4698112
Available Extended Virtual , 0, 0, 0
Committed Memory Bytes , 305979392, 303964160, -2015232
Non Paged Pool Usage Bytes , 81997824, 82333696, 335872
Paged Pool Usage Bytes , 164028416, 164057088, 28672
Paged Pool Available Bytes , 4147101696, 4149116928, 2015232
Free System PTEs , 33556099, 33556099, 0
We have made some efforts to consolidate I/O calls and increase the average bytes per I/O operation. Generally though they don't seem too unreasonable:
Total Avg. Rate
Context Switches , 462496, 6566/sec.
System Calls , 835593, 11863/sec.
Page Faults , 32814, 466/sec.
I/O Read Operations , 747, 11/sec.
I/O Write Operations , 3792, 54/sec.
I/O Other Operations , 27565, 391/sec.
I/O Read Bytes , 382146, 512/ I/O
I/O Write Bytes , 684128, 180/ I/O
I/O Other Bytes , 890365, 32/ I/O
Kernrate reports the vast majority of time spent in the NTOSKRNL module and a rate of almost 10 million events per second:
OutputResults: KernelModuleCount = 131
Percentage in the following table is based on the Total Hits for the Kernel
ProfileTime 12806 hits, 65536 events per hit --------
Module Hits msec %Total Events/Sec
NTOSKRNL 10561 70429 82 % 9827282
AMDPPM 1001 70429 7 % 931456
HAL 701 70429 5 % 652298
VMXNET3N61X64 351 70428 2 % 326619
WIN32K 68 70428 0 % 63276
TCPIP 35 70428 0 % 32568
AFD 19 70428 0 % 17680
NETIO 17 70429 0 % 15818
VM3DMP 11 70428 0 % 10235
NTFS 6 70429 0 % 5583
NDIS 6 70429 0 % 5583
LSI_SAS 6 70429 0 % 5583
FLTMGR 6 70429 0 % 5583
VMMOUSE 5 70429 0 % 4652
I8042PRT 5 70429 0 % 4652
VMCI 3 70429 0 % 2791
CDD 2 70428 0 % 1861
DXGMMS1 1 70428 0 % 930
VMHGFS 1 70429 0 % 930
NSIPROXY 1 70428 0 % 930
With the breakdown in the NTOSKRNL module being as follows:
ProfileTime 10561 hits, 65536 events per hit --------
Module Hits msec %Total Events/Sec
KeSynchronizeExecution 2485 70429 23 % 2312356
ExReleaseRundownProtectionCacheAwareEx 2099 70429 19 % 1953173
IoGetPagingIoPriority 1561 70429 14 % 1452550
memmove 1006 70429 9 % 936108
RtlDelete 952 70429 9 % 885860
memset 448 70429 4 % 416875
ExfAcquirePushLockExclusive 326 70429 3 % 303351
PoStartNextPowerIrp 280 70429 2 % 260547
RtlFindClearBits 239 70429 2 % 222395
KeBugCheckEx 199 70429 1 % 185174
KeWaitForMultipleObjects 92 70429 0 % 85608
FsRtlTeardownPerFileContexts 67 70429 0 % 62345
KeSetTimer 62 70429 0 % 57692
NtWaitForSingleObject 59 70429 0 % 54901
wcsncat_s 58 70429 0 % 53970
What I don't understand here is how what seems like a reasonable number of I/O operations per second (<500/sec if I understand kernrate properly?) translates into millions of events for KeSynchronizeExecution
, ExReleaseRundownProtectionCacheAwareEx
and IoGetPagingIoPriority
, which are presumably the source of the high kernel CPU?
I have searched for more information on these functions but information is thin on the ground about what they do and what can translate into high CPU usage regarding them.