2

I have a Windows console application that uses a parallel IO card for high speed data transmission. (General Standards HPDI32ALT)

My process is running in user mode, however, I am sure somewhere behind the device's API there is some kernel mode driver activity (PCI DMA transfers, reading device status registers etc..) The working model is roughly this:

  • at startup: I request a pointer to an IO buffer from API.
  • in my main loop:
    • block on API waiting for room in device's buffer (low watermark)
    • fill the IO buffer with transmission data
    • begin transmission to device by passing it the pointer to the IO buffer (during this time the API uses DMA on PCI bus to move the data to the card)
    • block on API waiting for IO to complete

The application appears to be working correctly with proper data rate and sustained throughput for long periods of time, however, when I look at the process in sys internals tool process explorer I see a large number of page faults (~6k per second). I am moving ~30MB/s to the card.

I have plenty of RAM and am reasonably sure the page faults are not disk IO related.

Any thoughts on what could be causing the page faults? I also have a receive side to this application that is using an identical IO card in receive mode. The receive mode use of the API does not cause a large number page faults.

Could the act of moving the IO buffer to kernel mode cause page faults?

JeffV
  • 52,985
  • 32
  • 103
  • 124

1 Answers1

0

So your application asks the driver for a memory buffer and you copy the send data into that buffer? That's a pretty strange model, usually you let the application manage the buffers.

If you're faulting 6K pages/s and you're only transfering 30MB/s, you're almost getting a page fault for every page you transfer. When you get the data buffer from the driver, is it always zero filled? I'm wondering if you're getting demand zero faults for every transfer.

-scott

snoone
  • 5,409
  • 18
  • 19
  • the manual says that using the IO buffer allows it to allocate a block of memory that is contiguous in physical memory for DMA performance reasons. – JeffV Aug 10 '11 at 16:18