We are currently investigating high disk latency on a Windows server 2012 r2 that run as an SQL server. It is a virtual machine under VMware and the datastore of the faulty disk is linked with a very high performance LUN on a SAN.
The SAN shows very good response time for the LUN even during incidents and during my test. The datastore also shows very good response time at every moment. The cpu and memory aren't the bottleneck, I have double checked.
Microsoft suggested we use diskspd to test our disk performance. Here is the result of 2 sets of test. I have runned them a couple time, for longer interval and at different time so I am sure the result aren't incidental.
Command Line: diskspd -b64k -o32 -t4 -d60 -w50 -Sw -r -L -c20G -Z1G C:\iotest.data
Total IO thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
12623020032 | 192612 | 200.59 | 3209.46 | 38.636 | 21.687
Command Line: diskspd -b64k -o32 -t4 -d60 -w50 -Su -r -L -c20G -Z1G C:\iotest.data
Total IO thread | bytes | I/Os | MB/s | I/O per s | AvgLat | LatStdDev | file
78517239808 | 1198078 | 1247.71 | 19963.34 | 6.410 | 8.202
According to the documentation about diskspd -Sw disable write-though IO and -Su disable software caching. Let me specify that with or without -Sw, on the first command line, the result are the same letting me know that this flag does not have much impact. From this tool(created and managed by the Windows team), we could conclude the cache (disable with -Su) is ruining disk performance but this does not seem right.
My questions are :
Why would software caching lower the performance ?
Does it impact running application the same way it impacts this test ?
IOMeter gives me the same performance as the test without software caching, why ?
Thanks,