9

I'm communicating between two processes on different machines via a pipe, using IO completion routines.

Occasionally, when the completion routine for WriteFileEx gets called, the completion routine parameter dwErrorCode is 0 (i.e. no error), GetOverlappedResult returns true (i.e. no error), but dwNumberOfBytesTransfered does not match nNumberOfBytesToWrite in the call to WriteFileEx. I only see this on the client end of the pipe however.

If the number of bytes transferred does not match the number of bytes that was requested to transfer, how can this be deemed a success?

This is how the client's handle to the pipe is created:

mHPipe = CreateFile(pipeName,                 // pipe name 
                    GENERIC_READ |            // read and write access 
                    GENERIC_WRITE, 
                    0,                        // no sharing 
                    NULL,                     // default security attributes
                    OPEN_EXISTING,            // opens existing pipe 
                    FILE_FLAG_OVERLAPPED |    // overlapped
                    FILE_FLAG_WRITE_THROUGH,  // write through mode
                    NULL);                    // no template file 

// do some checking...

// The pipe connected; change to message-read mode. 
DWORD dwMode = PIPE_READMODE_MESSAGE; 
BOOL fSuccess = SetNamedPipeHandleState(mHPipe,   // pipe handle 
                                        &dwMode,  // new pipe mode 
                                        NULL,     // don't set maximum bytes 
                                        NULL);    // don't set maximum time 

Can anyone see why this would happen?

Thanks

EDIT:

The relevant WriteFileEx code is as follows:

void WINAPI CompletedWriteRoutine(DWORD dwErrorCode, DWORD dwNumberOfBytesTransfered, LPOVERLAPPED lpOverLap)
{
  BOOL fWrite = FALSE;
  LPPIPEINST lpPipeInst = (LPPIPEINST)lpOverLap;

  //
  // !  99.9% of the time, dwNumberOfBytesTransfered == lpPipeInst->cbDataSize
  //    but 0.1% of the time, they do not match
  //

  // Some stuff

  // Copy next message to send
  memcpy_s(lpPipeInst->chData, sizeof(lpPipeInst->chData), pMsg->msg, pMsg->size);
  lpPipeInst->cbDataSize = pMsg->size;

  // Some other stuff

  fWrite = WriteFileEx(lpPipeInst->hPipeInst,
                       lpPipeInst->chData,
                       lpPipeInst->cbDataSize,
                       (LPOVERLAPPED) lpPipeInst,
                       (LPOVERLAPPED_COMPLETION_ROUTINE)CompletedWriteRoutine);

  // Some other, other stuff
}

Where LPPIPEINST is declared as:

typedef struct 
{ 
  OVERLAPPED oOverlap;      // must remain first item
  HANDLE hPipeInst;
  TCHAR chData[BUFSIZE];
  DWORD cbDataSize;
} PIPEINST, *LPPIPEINST;

And the initial call to CompletedWriteRoutine is given the lpOverlap parameter declared thusly:

PIPEINST pipeInstWrite        = {0};
pipeInstWrite.hPipeInst       = client.getPipeHandle();
pipeInstWrite.oOverlap.hEvent = hEvent[eventWriteComplete];

EDIT:

After trying re-initializing the overlapped structure as Harry suggested, I noticed something peculiar. I memset the OVERLAPPED structure to zero before each WriteFileEx, and roughly 1/5000 completion routine callbacks, the cbWritten parameter and the OVERLAPPED structure's InternalHigh member was now set to the size of the previous message, instead of the most recent message. I added some logging to file on both the client and server ends of the pipe inside the completion routines, and the data sent and received at both ends was an exact match (and the correct, expected data). This then unveiled that in the time taken to write the data to a file, the InternalHigh member in the OVERLAPPED structure had changed to now reflect the size of the message I was expecting (cbWritten remains the old message size). I removed the file logging, and am now able to reproduce the issue like clockwork with this code:

void WINAPI CompletedWriteRoutine(DWORD dwErr, DWORD cbWritten, LPOVERLAPPED lpOverLap)
{
  LPPIPEINST lpPipeInst = (LPPIPEINST)lpOverLap;

  // Completion routine says it wrote the amount of data from the previous callback
  if (cbWritten != lpPipeInst->cbDataSize)
  {
    // Roughly 1 in 5000 callbacks ends up in here

    OVERLAPPED ovl1 = lpPipeInst->oOverlap; // Contains size of previous message, i.e. cbWritten
    Sleep(100);
    OVERLAPPED ovl2 = lpPipeInst->oOverlap; // Contains size of most recent message, i.e lpPipeInst->cbDataSize
  }
  ...
}

It seems that sometimes, the completion routine is being called before the OVERLAPPED structure and the completion routine input parameter is updated. I'm using MsgWaitForMultipleObjectsEx(eventLast, hEvent, INFINITE, QS_POSTMESSAGE, MWMO_ALERTABLE); for the completion routines to be called on Windows 7 64 bit.

This MSDN page says:

"The system does not use the OVERLAPPED structure after the completion routine is called, so the completion routine can deallocate the memory used by the overlapped structure."

...so apparently, what this code can reproduce should never happen?

Is this a WINAPI bug?

parrowdice
  • 1,902
  • 15
  • 24
  • 2
    If it is anything like normal network IO, when writing you need to loop until you have written all data, paying attention to how much was just written. Are you writing large amounts of data and possibly filling up the pipe so the reader isn't able to read it off fast enough for the writer to write it in one chunk? – Ryan Guthrie Nov 22 '12 at 18:09
  • I'm seeing this when writing small amounts, such as 7 bytes, and the completion routine says it writes 11 bytes (the pipe buffer is 4096 bytes). The pipe is using write through mode, which should send all of the data from WriteFileEx in one go, and the completion routine should take care of not needing to do any looping - as it's a callback for when the data was sent. – parrowdice Nov 22 '12 at 18:13
  • This question is related to Windows and the Windows API, make sure you tag it as such next time. Also, is this a C or a C++ question? – netcoder Nov 22 '12 at 18:17
  • Any chance you can post the `WriteFileEx()` invocation that is pushing your data, along with info concerning where the buffer handed to it for consumption is maintained (declared and/or allocated) ? – WhozCraig Nov 22 '12 at 20:03
  • @user1836044: there is no way that the completion routine can report 11 bytes were written if you asked `WriteFileEx()` to only write 7 bytes. There has to be a bug in your code, such as if your `OVERLAPPED` struct is corrupted, or you did not declare your routine signature correctly but used a type-cast to hide that error. – Remy Lebeau Nov 23 '12 at 06:14
  • I agree with @RemyLebeau. No way Windows is reporting more bytes written than you requested. – Carey Gregory Nov 23 '12 at 22:11
  • Have you left some relevant code out of WriteFileEx? You should be resetting the event, for one thing, and probably should reinitialize oOverlap as well, since Windows may have written to some of the internal members. Also, are you sure there's only ever one I/O operation taking place at a time? – Harry Johnston Nov 25 '12 at 21:17
  • @RemyLebeau and CareyGregory: I would be inclined to take your side too if I were not seeing this for myself. What's strange is that sometimes with 7 bytes requested, the completion routine says it has written 11 bytes, but then if you call GetOverlappedResult inside the completion routine with the same OVERLAPPED structure, that says that 7 bytes were written. Believe me, it's happening. – parrowdice Nov 26 '12 at 09:21
  • @HarryJohnston: The event is created as an auto-reset event - no need to manually reset it. No need to reset the OVERLAPPED structure either (see MSDN [link](http://msdn.microsoft.com/en-us/library/windows/desktop/aa365601%28v=vs.85%29.aspx)). The application will have read and write operations pending simultaneously, but they use different OVERLAPPED structures, which is perfectly legal. – parrowdice Nov 26 '12 at 09:24
  • 1
    @HarryJohnston: WriteFileEx() does not use the OVERLAPPED::hEvent field at all, so it does not even need to point at a valid Event. It is useful for carrying user-defined data in this case. – Remy Lebeau Nov 26 '12 at 16:48
  • @parrowdice: you showed your completion routine, but you did not show your initial WriteFileEx() call that is starting a new write operation. Also, you are declaring `pipeInstWrite` on the stack. You cannot do that. The documentation even says so. The `OVERLAPPED` has to remain in memory until the write operation completes. Allocate it on the heap, not the stack, and free it when you are done using it. – Remy Lebeau Nov 26 '12 at 17:00
  • @RemyLebeau: Thanks for your reply, however, in the post, it says "the initial call to CompletedWriteRoutine is given the lpOverlap parameter declared thusly". By this I mean that the first WriteFileEx() call is in the completion routine as shown - I call the completion routine myself the first time to start the cycle. – parrowdice Nov 26 '12 at 17:18
  • @RemyLebeau: Also, I believe pipeInstWrite can be declared on the stack. The documentation doesn't say it can't, it says "It should not be a variable that can go out of scope while the write operation is pending completion", which is true of my implementation - pipeInstWrite, although allocated on the stack, never goes out of scope while there is pending IO. – parrowdice Nov 26 '12 at 17:19
  • @parrowdice: "I call the completion routine myself the first time to start the cycle." You did not show that code or specifically say that earlier. Either way, my comment about stack usage still applies. You are declaring your `PIPEINST`, and thus its `OVERLAPPED`, on the stack (that much you did show). I assume (since you did not show it) that you are calling `CompletedWriteRoutine(..., &pipeInstWrite)`, which would pass the stack-based `LPOVERLAPPED` pointer to `WriteFileEx()`, which is wrong. – Remy Lebeau Nov 26 '12 at 20:16
  • @parrowdice: Does your code actually wait for `WriteFileEx()` to call `CompletedWriteRoutine()` again while your `PIPEINST` variable is still in scope? If so, how does it wait? You have not shown any of that code, so nobody cann tell you one way or the other whether your code is actually valid or not. – Remy Lebeau Nov 26 '12 at 20:18
  • @RemyLebeau: I did say I call the completion routine, but I agree, it could have been clearer. I'm not sure how you can state the stack usage is wrong. In simplified pseudo code: `create client pipe connection` `declare OVERLAPPED structure on the stack` `call CompletedWriteRoutine()` `loop until error or exit` `{wait for IO completion}` `cancel pending IO and wait for completion` `clean up` `OVERLAPPED out of scope here - which is valid` But on average, about 1/5000 completion routine calls in the loop has the issue I described. – parrowdice Nov 27 '12 at 11:32
  • 1
    Remy is right about WriteFileEx not using the event handle, my mistake. But I still think you should be reinitializing the OVERLAPPED structure before each call. This is a documented requirement, and sad to say, MS sample code is not always perfectly correct. See http://msdn.microsoft.com/en-us/library/windows/desktop/ms684342%28v=vs.85%29.aspx "Any unused members of this structure should always be initialized to zero before the structure is used in a function call." – Harry Johnston Nov 27 '12 at 21:02
  • @HarryJohnston: Thanks for the suggestion. The link you posted and [here](http://msdn.microsoft.com/en-us/library/windows/desktop/aa365748%28v=vs.85%29.aspx) say "... do not use after initialization to zero", and "The WriteFileEx function does use ... the OVERLAPPED structure. You should not change the value of these members", which is why I wasn't already re-initializing between calls. However, the FileIOCompletionRoutine docs say you can free the `OVERLAPPED` structure when the completion routine is called, so it must be safe to re-initialize then. I will try it and let you know if it works. – parrowdice Nov 28 '12 at 10:33
  • @HarryJohnston: I tried your suggestion - see the latest edit – parrowdice Nov 28 '12 at 19:28
  • As a troubleshooting measure, perhaps you could try allocating a huge array of OVERLAPPED structures (100000 or so) and then using a different one for each call? It would be interesting to see whether the problem still occurs, and also to see whether (on the affected calls) you're being given the right pointer or the previous one. (Note that, as Remy pointed out, you can store a pointer to your LPPIPEINST in hEvent.) – Harry Johnston Nov 28 '12 at 20:16
  • If you can reproduce the problem in a smallish program (without too much of your intellectual property!) I'd be very interested to take a look. You can post a link, or contact me by email me if you prefer. – Harry Johnston Nov 29 '12 at 01:12

1 Answers1

3

Added FILE_FLAG_NO_BUFFERING to the CreateFile call - haven't seen the problem since. Thanks everyone who commented for your time.

parrowdice
  • 1,902
  • 15
  • 24