1

Before I get to my question, I'll go over what I am currently working with so you have a decent idea of what I've already done/tried.

I have a multithreaded usermode Windows Desktop Application that issues DeviceIOControl calls to a KMDF driver (purely software, no hardware). There are 5 seperate threads that all make the same custom IOCTL call to the driver constantly. This request consists of:

  1. PsLookupProcessByProcessId to get the process to read memory from.
  2. MmCopyVirtualMemory to copy the requested memory into the supplied buffer.
  3. ObDereferenceObject to decrement the reference count.

The driver is currently doing this serially, and the main bottleneck in my usermode application is waiting for the memory reads to complete and everything needs to be complete before the scene can be "rendered".

I've reduced the amount of DeviceIOControl requests as much as I can, so now I've been looking into overlapped IO and allowing each thread to send requests asynchronously. My question is if this is even something worth trying, as I do not know if I can use multiple threads in my driver to read from different addresses at the same time.

ryanbowen
  • 23
  • 4
  • @HarryJohnston _Inside_ each thread, the reads depend on the preceding ones in order to return the correct data. The threads themselves are independant of one another however. If I open the file as OVERLAPPED and inside my device control function wait on the IOCTL to complete or timeout, will I still see a performance gain due to the driver handling the 5 threads non-serially? Essentially what I'm asking is, with overlapped IO, is it the equivalent of one instance of my driver handling each thread or is it still one driver getting a ton of requests from 5 threads and struggling to keep up? – ryanbowen Feb 08 '17 at 20:24
  • @HarryJohnston Sorry, I'm having trouble finding the right words to express my questions. I've been searching far and wide trying to find out what exactly opening a file as overlapped actually changes with respect to how the WDF handles IOCTL requests but all I've found is implementations. Does each request spawn a "thread" (or equivalent) so requests get handled parallely in my driver or does my driver's dispatch just work through the queue sequentially and its up to me to implement the parallelization of reads? – ryanbowen Feb 08 '17 at 23:05
  • OK, forget most of my previous comments - I hadn't quite clicked that you're using WDF, so I/O requests go into a queue rather than being sent directly to your code. My mistake; sorry. – Harry Johnston Feb 08 '17 at 23:18
  • `The driver is currently doing this serially` - so you use `WdfIoQueueDispatchSequential` ? why not `WdfIoQueueDispatchParallel` ? – RbMm Feb 09 '17 at 00:02
  • how i understand user mode must create file on your device with `FILE_FLAG_OVERLAPPED` (or without `FILE_SYNCHRONOUS_IO_[NO]NALERT`) flag. driver from own side must use `WdfIoQueueDispatchParallel` queue type – RbMm Feb 09 '17 at 00:06

2 Answers2

2

OK, it looks like the most important part of your question is here:

I've been searching far and wide trying to find out what exactly opening a file as overlapped actually changes with respect to how the WDF handles IOCTL requests [...]

It doesn't change anything; all requests to device drivers are asynchronous.

When you perform I/O on a synchronous handle, Windows issues an asynchronous I/O request to the driver on your behalf and waits for it to complete. As far as I know, the driver doesn't even have any way to tell whether the original request was synchronous or overlapped. [Edit: this isn't really true. As RbMm points out in the comments, the kernel does in fact draw a distinction between synchronous and asynchronous I/O, but from a practical standpoint this shouldn't matter to you.]

Anyway, if the driver is currently only running on a single thread, using overlapped I/O won't help. You will have to modify the driver. Conversely, modifying the driver should be sufficient; you probably don't need to change the application. (Exception: I'm not sure whether or not it is legal to use the same synchronous handle simultaneously from multiple threads, so I recommend that each thread open its own handle to the device, at least until you're sure the driver is working as desired.)

I'm not familiar with WDF, but the MSDN entry Dispatching Methods for I/O Requests looks relevant.

Harry Johnston
  • 35,639
  • 6
  • 68
  • 158
  • really driver can determinate is operation synchronous - [IoIsOperationSynchronous](https://msdn.microsoft.com/en-us/library/windows/hardware/ff548443(v=vs.85).aspx) – RbMm Feb 08 '17 at 23:46
  • @RbMm: noted; thanks. Do you have any idea of why a driver might distinguish between the two cases? Some sort of optimization, perhaps? – Harry Johnston Feb 09 '17 at 00:04
  • this is used primary by filesystem drivers. look `fastfat` example. for example FS Update the current file position only if operation synchronous – RbMm Feb 09 '17 at 00:10
  • `if (SynchronousIo && !PagingIo) { FileObject->CurrentByteOffset.QuadPart = StartingLbo + Irp->IoStatus.Information; }` – RbMm Feb 09 '17 at 00:11
  • but primary reason for what driver need determinate sync/async operation type is - [if an thread can block for I/O or wait for a resource](https://github.com/Microsoft/Windows-driver-samples/blob/master/filesys/fastfat/fatprocs.h#L2319) - but usually this is used only by enough complex drivers – RbMm Feb 09 '17 at 00:55
  • 1
    `Call the common set routine, with blocking allowed if synchronous` - if operation `synchronous` - client anyway will be blocked at some place until request not finished. so driver can wait yourself. if operation asynchronous, the best practice not block client. if we can not complete just - need save request and return STATUS_PENDIG. and complete it later. but all this for enough complex caseses – RbMm Feb 09 '17 at 00:59
2

at first very important how user mode open file - in synchronous or asynchronous mode ? (FILE_FLAG_OVERLAPPED for CreateFile or FILE_SYNCHRONOUS_IO_[NO]NALERT for ZwOpenFile or ZwCreateFile)

if file opened in synchronous mode (FO_SYNCHRONOUS_IO will be in FILE_OBJECT.Flags) I/O subsystem serialize all request to file - so it not send new request to your device until previous is finished. with asynchronous file object - no such restriction - request(IRP) will be just send to your device

if you say that

The threads themselves are independent of one another however.

you need open file as asynchronous (with FILE_FLAG_OVERLAPPED) if threads share single file handle (FILE_OBJECT) or every thread must separate open own private file on your device. think better shared asynchronous file.

from driver side you must use WdfIoQueueDispatchParallel queue dispatch type. so just take request (IRP) handle it and complete (how i understand you not send this request to another driver, or put to some another queue)

Essentially what I'm asking is, with overlapped IO, is it the equivalent of one instance of my driver handling each thread or is it still one driver getting a ton of requests from 5 threads and struggling to keep up?

you always have one instance of driver and count of device exactly how many you create it. if you create only one device - and will be only this one device. all files will be opened on this device. all requests (from any process/thread) will be send to this single device instance.

if you use same file for all threads and it will be synchronous file - the I/O subsystem serialize all request to driver - and this is bad for you. clients of your driver (device) must open file as asynchronous (or every client open own private file). from driver side - you need WdfIoQueueDispatchParallel queue dispatch type because how i understand all request is independent and you not need synchronization between requests

RbMm
  • 31,280
  • 3
  • 35
  • 56