0

We are working on stress testing of an application and noticed a curious case where Windows kernel takes over execution of the stress test. The application under the stress test picks up system-wide IO events, mostly CRUD of files, using minifilter driver, similar to this passthough driver.

During the stress test using spddisk on one particular VM (Windows Server 2019 1809 17763.864 with no external AV or other security software installed), we noticed that System process with reserved pid 4 is "taking over" the stress payload execution. The same seems to be happening with a simple batch script as well (create, read, delete a file in a loop). I have never seen anything like this on any other systems and we cannot reproduce this behaviour anywhere else but on that single VM.

The "taking over" occurs after a few seconds into executing of the stress test and manifests itself in the following way:

  • Process id is changed to System reserved pid 4
  • Thread id is changed to another tid
  • User SID is changed from S-1-5-21-2874696658-2485333267-3621126573-500 to S-1-5-18
  • User is changed from win-saacuiping\administrator to NT AUTHORITY\SYSTEM

We have captured this in procmon

procmon dump

This feels like some sort of sandboxing, but I've never actually seen this before. The simplified command used for stress testing is this

diskspd.exe -c100b -b1K -t2 -d60 -w50 -W0 -Sh f1.tmp

Can anyone please explain, why execution is suddenly passed from one process to Windows kernel System 4?

oleksii
  • 35,458
  • 16
  • 93
  • 163
  • If an asynchronous I/O request can't acquire the file/directory control block (FCB/DCB) without waiting, it has to queue a work item that dispatches the request on a system thread, which is allowed to wait. What happens if you omit -t2, i.e. only use a single thread per file? – Eryk Sun Jul 13 '20 at 14:59
  • @ErykSun tried without `-t2` and it is the same behaviour. This only happens on a single VM and we cannot reproduce it elsewhere using the same stress test command. I've not heard of FCB, and wiki says it's a predecessor to file handles. Wiki says support for FCB was not included in FAT32 (I believe we use NTFS) and Windows 95. It's a good guess though. I cannot comprehend how something like this is possible – oleksii Jul 14 '20 at 12:02
  • File control blocks (FCBs) and stream control blocks (SCBs) are fundamental to Windows filesystems. When you open a filesystem file, the I/O manager passes a [File object](https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_file_object) to the filesystem driver in the `IRP_MJ_CREATE` request, and typically the filesystem sets a pointer to a context control block (CCB) in the `FsContext2` field, for information that pertains to just this File object, and a pointer to the FCB or SCB in the `FsContext` field, which begins with an FsRTL `FSRTL_ADVANCED_FCB_HEADER` record. – Eryk Sun Jul 14 '20 at 13:59
  • I wasn't guessing per se. I'm reading the published [FastFat source code](https://github.com/microsoft/Windows-driver-samples/blob/master/filesys/fastfat/read.c) for FAT filesystems. An `IRP_MJ_READ` dispatches to `FatFsdRead` on the client thread, which calls `FatCommonRead`. You can see several places where this routine gives up if waiting isn't allowed (i.e. async access). It will will do a `try_return( PostIrp = TRUE )`, which is a macro that goes to `try_exit:`, which calls `FatFsdPostRequest`, which calls `FatAddToWorkque`, which posts to a system work queue via `ExQueueWorkItem`. – Eryk Sun Jul 14 '20 at 14:23
  • The guessing part is in assuming that when `FatCommonRead` (in the case of a FAT filesystem) is called again on a thread in the system process that ProcMon will see low-level IRPs issued to the volume device as originating from the System process, or whether it attributes them as associated IRPs for the master IRP in the client process context. – Eryk Sun Jul 14 '20 at 14:29
  • @ErykSun Looking into the call stacks, it appears to be `TSFileShare.sys`, things like `TSFileShare.sys:TSSchedulerWorkerThread`, `TSFileShare.sys:ContinuePendedIo` and `TSFileShare.sys:_CompletePendedIo`. Seems like this was related to https://support.microsoft.com/en-gb/help/4494631/fair-share-technologies-enabled-by-default-in-remote-desktop-services. Disabling fair share in the registry removed this behaviour. We'll make a little write-up. – oleksii Jul 16 '20 at 09:04

0 Answers0