0

I have a program that reads from a serial port this is done with overlapped IO. I can not understand how the WaitForMultipleObjects can timeout (200 ms) when I have a lower timeout in the SetCommTimeouts (100 ms) call. It only happens very rarely. But when it does it often happens on multiply different serial ports simultaneously. Sometimes it is also in different applications that use the same source code to communicate on a different com port.

The program will first open the com port with the:

 CreateFile(szCommDevice, GENERIC_READ | GENERIC_WRITE, 0, NULL,
             OPEN_EXISTING, FILE_FLAG_OVERLAPPED, NULL); 

after it is opened it will call GetCommProperties and PurgeComm and then:

timeouts.ReadIntervalTimeout = 0;
timeouts.ReadTotalTimeoutMultiplier = 0;
timeouts.ReadTotalTimeoutConstant = 100;
SetCommTimeouts(hComHandle, &timeouts);

It also calls SetCommState and SetCommMask before it starts reading the port in a loop with:

dwRequestedSize = dwClearError();
if( dwRequestedSize <= 0 )
    dwRequestedSize = 1;
else if( dwRequestedSize > nBufferSize )
    dwRequestedSize = nBufferSize;

if (!ReadFile(hComHandle, clReceived.pBuffer, dwRequestedSize , &dwRead, &osReader ))
{
    DWORD dwLastError = GetLastError();
    if ( dwLastError != ERROR_IO_PENDING)
    {
        bCloseComPort();
        continue;
    }
    else
        fWaitingOnRead = TRUE;
}
else // read completed immediately
{
    if( dwRead > 0 )
    {
        // Add To Receive Buffer
    }
}

The dwRequestedSize is initilized from ClearCommError with the cbInQue. So this is the number of bytes in the buffer. The code will also call WaitCommEvent with overlapped IO, before it starts to wait with:

DWORD dwRes = WaitForMultipleObjectsEx(2 , hEventArray, FALSE,  200 , true );
switch(dwRes)
{
    // Read completed.
    case WAIT_OBJECT_0:
        if (!GetOverlappedResult(hComHandle, &osReader, &dwRead, FALSE))
        {
           bCloseComPort();
        }   // Error in communications; report it.
        else // Read completed successfully.
        {
            if( dwRead > 0 )
            {
                // Add To Receive Buffer
            }
        }
        //  Reset flag so that another opertion can be issued.
        fWaitingOnRead = FALSE;
        break;

    // Status completed
    case WAIT_OBJECT_0 + 1:
        if (!GetOverlappedResult(hComHandle, &osStatus, &dwOvRes, FALSE))
        {
           bCloseComPort();
        }   // Error in communications; report it.
        else
        {
            ReportStatusEvent(dwCommEvent);
        }
        fWaitingOnStat = FALSE;
        break;

    case WAIT_TIMEOUT:
        // Why can I get here ?
        break;
}

Am I doing something wrong here ? Thanks for any input on this.

Some additional information.

I understand that paging-out may be an issue. But I am still a bit reluctant to accept this for two reasons.

1) The computers only run the software described, there is no user interaction with the PC, and it has plenty of RAM (4 Giga bytes), The application only use about 50 Mbytes memory. Therefore, the Pc has at least 2 Giga bytes free memory always.

2) The application is running 24/7 and the activity around this code is very heavy it will be either sending data or receiving data. So I assume windows would not select the code or memory as a candidate for paging-out.

Maybe there is something about the paging that I do not fully understand and in that case can you explain a bit more.

I think I will be adding a check to see how long the wait actual is. I assume that in case the wait is about 200 ms. then paging or other windows lockup stuff can be ruled out. Because it would be unlikely, that it would be exactly the same 200 ms. Agree? I assume the GetTickCount would have significant precision for this and that would add the least overhead.

Kennet
  • 323
  • 4
  • 12
  • The timeouts are rather short, you'll run into trouble on a demand-paged virtual memory operating system like Windows when the machine is heavily loaded and the OS is forced to page-out driver or app code. Takes a while to page them back in. Use timeouts to detect gross problems. Seconds. – Hans Passant Sep 15 '15 at 16:44
  • Have you measured the actual length of time spent in the wait? – Harry Johnston Sep 15 '15 at 22:45
  • Hi thanks for the input. – Kennet Sep 16 '15 at 06:20
  • I have updated the question with some additional information / answers – Kennet Sep 16 '15 at 06:45
  • It would be worth temporarily changing the wait timeout to a second, and seeing whether that makes any difference - if it waits the full second and then times out, then the serial port I/O isn't timing out. If it waits only the 100ms, then the serial port I/O is timing out but the wait is responding in an unexpected way. (Actually it wouldn't particularly surprise me if an timeout from the underlying I/O could cause a wait operation to incorrectly report a timeout. That sounds like a fairly typical abstraction layer leak.) – Harry Johnston Sep 16 '15 at 22:08
  • It is also possible that I/O timeouts aren't supposed to be used in combination with asynchronous I/O, I'm not sure. Certainly it is redundant. – Harry Johnston Sep 16 '15 at 22:09

0 Answers0