0

I'm trying to use IOCP to do async file reads.

This works fine 99% of the time.. but that 1% of the time a request seems to never finish and so the operation just waits forever.

The sequence of API calls for the file that hangs is

#   Time of Day Relative Time   Thread  Module  Category    API Return Value    Error   Duration
1160    4:14:50.142 PM  0:00:00:308 1   ghc-stage2.exe  File Management CreateFileW ( "\\?\<path>\llvm-targets", GENERIC_READ, FILE_SHARE_DELETE | FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, FILE_FLAG_OVERLAPPED | FILE_FLAG_SEQUENTIAL_SCAN, NULL )    0x00000000000001f0      0.0001230
1173    4:14:50.143 PM  0:00:00:309 1   ghc-stage2.exe  File Management CreateIoCompletionPort ( 0x00000000000001f0, 0x00000000000001e4, 496, 0 )   0x00000000000001e4      0.0000407
1179    4:14:50.143 PM  0:00:00:309 1   ghc-stage2.exe  File Management GetFileType ( 0x00000000000001f0 )  FILE_TYPE_DISK      0.0000039
1190    4:14:50.143 PM  0:00:00:309 1   ghc-stage2.exe  File Management SetFileCompletionNotificationModes ( 0x00000000000001f0, 3 )    TRUE        0.0000075
1224    4:14:50.144 PM  0:00:00:310 1   ghc-stage2.exe  Error Handling  GetLastError (  )   ERROR_IO_PENDING        0.0000009
1223    4:14:50.143 PM  0:00:00:309 1   ghc-stage2.exe  File Management ReadFile ( 0x00000000000001f0, 0x00000000074d7010, 8192, NULL, 0x00000000073670e0 ) FALSE   997 = Overlapped I/O operation is in progress.  0.0000811
1276    4:14:50.144 PM  0:00:00:310 5   ghc-stage2.exe  File Management GetQueuedCompletionStatusEx ( 0x00000000000001e4, 0x0000000007406340, 64, 0x0000000007367140, INFINITE, FALSE ) 

Even when reading files one at a time this seems to happen and I'm not sure why. I initially thought that the pointer containing the LPOVERLAPPED_ENTRY structure may be becoming invalid, but checking the location with gdb shows it's still good.

(gdb) x/12w 0x0000000007406340
0x7406340:      0x00000000      0x00000000      0x00000000      0x00000000
0x7406350:      0x00000000      0x00000000      0x00000000      0x00000000
0x7406360:      0x00000000      0x00000000      0x00000000      0x00000000

Does anyone have any idea what might be going on?

Phyx
  • 2,697
  • 1
  • 20
  • 35
  • 2
    without view code can only say that error in your code somewhere – RbMm Aug 26 '18 at 15:50
  • just look `0x7406340` - you say that this is address of `OVERLAPPED` ? at first this address not view in call `ReadFile` from your listing, but main - look at it data. if operation still in progress (or just fail) - in first dword will be `0x103` but here is 0. this say that operation already finished or you overwrite memory. if operation already finished - second dword - number of bytes transferred in read. but here again 0. look like this is wrong memory pointer or you overwrite overlapped. that memory itself still valid - nothing say. – RbMm Aug 26 '18 at 16:09
  • @RbMm sorry, that was supposed to be `LPOVERLAPPED_ENTRY`. e.g. the location where `GetQueuedCompletionStatusEx` write the list of completed operations. And I didn't show the code because the code is written in Haskell. I thought the API calls would be more useful here. But if you want to see code I can. – Phyx Aug 26 '18 at 18:05
  • much more useful look for `OVERLAPPED` passed for "not finished" request. you send request to file on local disk ? make sure that it not completed without pending returned (you yourself disable iocp packet queue in this case). also (this is separate note, not direct related) - for what you use own iocp, yourself call `GetQueuedCompletionStatusEx` instead use system built-in functional here say via `BindIoCompletionCallback`. of course without src or binary code, without debugging in case not finished request hard say something. – RbMm Aug 26 '18 at 18:43
  • Yes the trace above shows that the `ReadFile` operation finished with `ERROR_IO_PENDING` I am however handling the case where the operation completes immediately . I'm using completion ports to multiplex requests from green threads onto the same os thread, so the scheduler in the language runtime can schedule other tasks while the io operation is blocked. And is able to cancel it is needed. Like I mentioned it all works fine 99% of the time. But every so often it just deadlocks.. – Phyx Aug 26 '18 at 19:11
  • io - finished with `ERROR_IO_PENDING` - look for overlapped for this i/o request - are 0x103 here. may be i/o really not finished (if this not local file). otherwise you must got completion packet. i ask not for what you using iocp, but for what use it direct. more easy and not less effective let system do most job for you - via `BindIoCompletionCallback` or new tp api – RbMm Aug 26 '18 at 19:40

0 Answers0