2

I'm trying to write code to make it so that a microcontroller can communicate over USB with a PC. I think that the majority of the code is in place, and communication generally works fine so long as it's slow (> ~5ms per message). However, I'd like to make it work with a 0% drop rate at much higher speeds (preferably at a rate of a few dozen to a few hundred microseconds between each message). Communication is configured to use full-speed USB 2.0 protocol, so I have a transmission rate of around 12 MHz.

I'm finding it harder to debug issues with USB relative to (say) CAN because of how the protocol works, but I believe the issue is with the PC-side driver that I wrote to interface with the device.

The device and driver use the two standard control endpoints (physical endpoints 0 and 1) and additionally use two bulk endpoints:

  • BULK 2 Out (physical endpoint 4)
  • BULK 2 In (physical endpoint 5)

Both of these have a maximum packet size of 64 bytes.

The driver I created was more-or-less based on the example code on Microsoft's site. I exported a receiveMessage function and, originally, every time this was called it allocated a 64-byte buffer and requested 64 bytes of data from the BULK 2 In pipe using WinUsb_ReadPipe. There's a 1ms timeout to prevent the application from hanging if there's no data to read.

This essentially means that the rate at which data is read from the pipe is limited to the rate that the application can poll it. If the device is writing data to the pipe faster than the application can read from it, I'm sure that's going to cause problems. I tried to solve the issue by creating a queue and a thread within the driver that does nothing but continuously poll the pipe and store any received messages in the queue, which the application can then read from at leisure using `receiveMessage'. This didn't work very well, though.

What I'd like is a code sample that demonstrates a way to ensure the following:

  • Data can be read from the BULK In pipe as fast as possible.
  • Has a way to 'buffer' messages so that the device can write to the pipe faster than the application can process them for short durations, but no messages are dropped.

I think that one possible approach might be to set up another bulk pipeline between the device and the driver. The device should then store all outgoing messages into a buffer and also keep a separate array containing the number of bytes of each of the messages at the corresponding index within this buffer. Then whenever the application wants to read a set of messages, the driver uses one of these pipelines to request the message byte array from the device. For each value in this array, the driver issues a request for that number of bytes using WinUsb_ReadPipe, which the device services by sending each message across the USB bus in the same order. I'm not sure if this would work or if it's overly complicated, though.

My current initialisation code:

_declspec(dllexport) int initialiseC()
{
    HRESULT               hr;
    BOOL                  bResult;
    BOOL                  noDevice;
    ULONG                 lengthReceived;
    ULONG                 cbSize;

    cbSize = 0;

    //OutputDebugString((LPCWSTR)"Beginning of initialisation\n");

    //
    // Find a device connected to the system that has WinUSB installed using our
    // INF
    //
    hr = OpenDevice(&deviceData, &noDevice);

    if (FAILED(hr)) {

        if (noDevice) {

            printf("Device not connected or driver not installed\n");
            MessageBoxA(NULL, "Device not connected or driver not installed", "Debug", MB_OK);

        }
        else {

            printf("Failed looking for device, HRESULT 0x%x\n", hr);
            MessageBoxA(NULL, "Failed looking for device", "Debug", MB_OK);
        }

        getchar();
        return 1;
    }

    //
    // Get device descriptor
    //
    bResult = WinUsb_GetDescriptor(deviceData.WinusbHandle,
        USB_DEVICE_DESCRIPTOR_TYPE,
        0,
        0,
        (PBYTE)&deviceDesc,
        sizeof(deviceDesc),
        &lengthReceived);

    if (FALSE == bResult || lengthReceived != sizeof(deviceDesc)) {

        printf("Error among LastError %d or lengthReceived %d\n",
            FALSE == bResult ? GetLastError() : 0,
            lengthReceived);
        MessageBoxA(NULL, "Initialisation error", "Debug", MB_OK);
        CloseDevice(&deviceData);
        getchar();
        return 2;
    }

    //
    // Print a few parts of the device descriptor
    //
    printf("Device found: VID_%04X&PID_%04X; bcdUsb %04X\n",
        deviceDesc.idVendor,
        deviceDesc.idProduct,
        deviceDesc.bcdUSB);

    // Retrieve pipe information.
    bResult = QueryDeviceEndpoints(deviceData.WinusbHandle, &pipeID);
    if (!bResult)
    {
        printf("Error querying device endpoints\n");
        MessageBoxA(NULL, "Error querying device endpoints", "Debug", MB_OK);
        CloseDevice(&deviceData);
        getchar();
        return 3;
    }

    // Set timeout for read requests.
    ULONG timeout = 1; // 1 ms.
    WinUsb_SetPipePolicy(deviceData.WinusbHandle, pipeID.PipeInId,
        PIPE_TRANSFER_TIMEOUT, sizeof(timeout), &timeout);

    // Create message polling thread.
    messagePollerHandle = CreateThread(NULL, 0, messagePoller, NULL, 0, NULL);
    if (messagePollerHandle == NULL)
    {
        printf("Error creating message poller thread\n");
        MessageBoxA(NULL, "Error creating message poller thread", "Debug", MB_OK);
        CloseDevice(&deviceData);
        getchar();
        return 4;
    }

    initStatus = 1;
    return 0;
}

I internally poll for messages within the driver with this code:

DWORD WINAPI messagePoller(LPVOID lpParam)
{
    BOOL                  bResult;
    ULONG                 cbReceived = 0;
    USB_TYPE_T            usbMessageIn;

    while (initStatus)
    {
        UCHAR* receiveBuffer = (UCHAR*)LocalAlloc(LPTR, MAX_PACKET_SIZE*1000);
        bResult = ReadFromBulkEndpoint(deviceData.WinusbHandle, &pipeID.PipeInId, MAX_PACKET_SIZE*1000, &cbReceived, receiveBuffer);

        if (!bResult)
        {
            printf("Error reading data from endpoint\n");
            //MessageBoxA(NULL, "Error reading data from endpoint", "Debug", MB_OK);
            getchar();
            CloseDevice(&deviceData);
            usbMessageIn.len = 0;
            return 1;
        }

        if (cbReceived == 0)
        {
            LocalFree(receiveBuffer);
            continue;
        }

        const char* input = reinterpret_cast<const char*>(receiveBuffer);

        strcpy_s(usbMessageIn.string, input);
        usbMessageIn.len = strlen(input);

        while (receiveMessageSema);
        receiveMessageSema = TRUE;
        while (receiveQueue.size() >= RECEIVE_QUEUE_MAX_SIZE) receiveQueue.pop_front();
        receiveQueue.push_back(usbMessageIn);

        receiveMessageSema = FALSE;

        LocalFree(receiveBuffer);
    }

    return 0;
}

The exported receiveMessage code that the application using the driver can use to receive messages is given here:

_declspec(dllexport) void receiveUSBMessageC(USB_TYPE_T *usbMessageIn)
{
    if (receiveMessageSema || receiveQueue.empty())
    {
        usbMessageIn->len = 0;
        strcpy_s(usbMessageIn->string, "");
    }
    else
    {
        receiveMessageSema = TRUE;
        *usbMessageIn = receiveQueue.front();
        receiveQueue.pop_front();
        receiveMessageSema = FALSE;
    }
}

EDIT

Based on Hasturkun's suggestions, I made the following changes to the code.

Within initialisation:

...

readEventHandle[0] = CreateEvent(NULL, FALSE, TRUE, TEXT("ReadEvent0"));
readEventHandle[1] = CreateEvent(NULL, FALSE, TRUE, TEXT("ReadEvent1"));

if (!readEventHandle[0] || !readEventHandle[1])
{
    printf("readEvent creation error\n");
    MessageBoxA(NULL, "readEvent creation error", "Debug", MB_OK);
    CloseDevice(&deviceData);
    getchar();
    return 5;
}

readPipeHandle[0] = CreateThread(NULL, 0, readPipe, &readPipeParam0, 0, NULL);
readPipeHandle[1] = CreateThread(NULL, 0, readPipe, &readPipeParam1, 0, NULL);

if (!readPipeHandle[0] || !readPipeHandle[1])
{
    printf("readPipe creation error\n");
    MessageBoxA(NULL, "readPipe creation error", "Debug", MB_OK);
    CloseDevice(&deviceData);
    getchar();
    return 6;
}

readOverlapped[0].hEvent = readEventHandle[0];
readOverlapped[1].hEvent = readEventHandle[1];

...

Read pipe thread:

DWORD WINAPI readPipe(LPVOID lpParam)
{
    BOOL         bResult;
    ULONG        cbReceived = 0;
    USB_TYPE_T   usbMessageIn;
    DWORD        waitResult;
    uint8_t      index = *static_cast<uint8_t*>(lpParam);
    BOOLEAN      init = FALSE;

    UCHAR receiveBuffer[MAX_PACKET_SIZE] = { 0 };

    LARGE_INTEGER start[2], end[2], freq;

    QueryPerformanceFrequency(&freq);
    QueryPerformanceCounter(&start[index]);
    uint32_t microDelta;

    while (initStatus)
    {
        waitResult = WaitForSingleObject(readEventHandle[index], INFINITE);

        switch (waitResult)
        {
        case WAIT_OBJECT_0:
            if (init)
            {
                WinUsb_GetOverlappedResult(deviceData.WinusbHandle, &readOverlapped[index], &cbReceived, FALSE);

                if (cbReceived > 0)
                {
                    // Read data, set usbMessageIn and add to queue.
                    strcpy_s(usbMessageIn.string, reinterpret_cast<const char*>(receiveBuffer));
                    usbMessageIn.len = cbReceived;

                    mtx.lock();
                    while (receiveQueue.size() >= RECEIVE_QUEUE_MAX_SIZE)
                    {
                        MessageBoxA(NULL, "Message receive queue full", "Debug", MB_OK);
                        receiveQueue.pop_front();
                    }
                    receiveQueue.push_back(usbMessageIn);

#ifdef CREATE_DEBUG_LOG
                    QueryPerformanceCounter(&end[index]);
                    microDelta = (end[index].QuadPart - start[index].QuadPart) * 1000000 / freq.QuadPart;
                    std::string str = usbMessageIn.string;
                    while (str.length() < 24) str += " ";

                    fprintf(logFile, "Message %s  (Len: %d)  (Queue size: %d)  (Delta: %6d us)  (Thread: %d)\n",
                        str.c_str(), cbReceived, receiveQueue.size(), microDelta, index);
                    QueryPerformanceCounter(&start[index]);
#endif

                    mtx.unlock();


                }
            }
            else
            {
                init = TRUE;
            }

            // Create another read request.
            std::fill(receiveBuffer, receiveBuffer + sizeof(receiveBuffer), 0);
            bResult = ReadFromBulkEndpoint(deviceData.WinusbHandle, &pipeID.PipeInId, MAX_PACKET_SIZE,
                NULL, receiveBuffer, &readOverlapped[index]);

            if (!bResult && GetLastError() != ERROR_IO_PENDING)
            {
                printf("Error reading data from endpoint\n");
                MessageBoxA(NULL, "Error reading data from endpoint", "Debug", MB_OK);
                getchar();
                CloseDevice(&deviceData);
                usbMessageIn.len = 0;
                return 1;
            }
            break;

        default:
            MessageBoxA(NULL, "Error handling read event", "Debug", MB_OK);
            break;
        }
    }

    return 0;
}

I also commented out everything to do with messagePoller, since readPipe handles all of that now.

However, I'm still encountering performance issues.

EDIT 2

Updated readPipe code above.

I'm seriously starting to wonder whether it is my driver that's the issue or the microcontroller. With a protocol like CAN it would be much easier to tell where the problem lies...

I got my driver to produce a log of all the messages it received as it received them, including additional details such as the time delta between a thread receiving two messages, and which thread is processing the message (I have two). I get an output like this:

Message 000000050FFF00640064      (Len: 20)  (Queue size: 1)  (Delta: 573120 us)  (Thread: 0)
Message 000000070000323232323232  (Len: 24)  (Queue size: 1)  (Delta: 593050 us)  (Thread: 1)
Message 000000070100323232323232  (Len: 24)  (Queue size: 1)  (Delta:  39917 us)  (Thread: 0)
Message 000000090000              (Len: 12)  (Queue size: 1)  (Delta:  39950 us)  (Thread: 1)
Message 0000000B0000              (Len: 12)  (Queue size: 1)  (Delta:  59842 us)  (Thread: 0)
Message 0000000D0FFF001B003A001B  (Len: 24)  (Queue size: 1)  (Delta:  59979 us)  (Thread: 1)
Message 0000030D001F              (Len: 12)  (Queue size: 2)  (Delta:  20207 us)  (Thread: 0)
Message 0000020D00280024001B002D  (Len: 24)  (Queue size: 3)  (Delta:    227 us)  (Thread: 1)
Message 0000000F000F000000000000  (Len: 24)  (Queue size: 1)  (Delta:  39890 us)  (Thread: 0)
Message 0000010F0000              (Len: 12)  (Queue size: 2)  (Delta:  39902 us)  (Thread: 1)
Message 0000001100FF001D001D0020  (Len: 24)  (Queue size: 1)  (Delta:  19827 us)  (Thread: 0)
Message 000001110FFF0020001E001E  (Len: 24)  (Queue size: 2)  (Delta:  19824 us)  (Thread: 1)
Message 00000211001E              (Len: 12)  (Queue size: 3)  (Delta:    224 us)  (Thread: 0)
Message 0000001300                (Len: 10)  (Queue size: 1)  (Delta:  19996 us)  (Thread: 1)
Message 0000001D4000              (Len: 12)  (Queue size: 1)  (Delta:  63864 us)  (Thread: 0)
Message 0000000D0FFF001600310016  (Len: 24)  (Queue size: 1)  (Delta: 4025107 us)  (Thread: 1)
Message 0000030D001F              (Len: 12)  (Queue size: 2)  (Delta: 3981220 us)  (Thread: 0)
Message 0000020D002800240016002D  (Len: 24)  (Queue size: 3)  (Delta:    326 us)  (Thread: 1)
Message 0000000D0FFF001600310016  (Len: 24)  (Queue size: 1)  (Delta:  58885 us)  (Thread: 0)
Message 0000030D0024              (Len: 12)  (Queue size: 2)  (Delta:  58852 us)  (Thread: 1)
Message 0000020D0024001F001F0031  (Len: 24)  (Queue size: 3)  (Delta:    310 us)  (Thread: 0)
Message 0000000D0FFF001B0036001B  (Len: 24)  (Queue size: 1)  (Delta:  49755 us)  (Thread: 1)
Message 0000030D0024              (Len: 12)  (Queue size: 2)  (Delta:  49886 us)  (Thread: 0)
Message 0000020D00240024001B0036  (Len: 24)  (Queue size: 3)  (Delta:    447 us)  (Thread: 1)
Message 0000000D0FFF001600360016  (Len: 24)  (Queue size: 1)  (Delta:  49703 us)  (Thread: 0)
Message 0000030D001F              (Len: 12)  (Queue size: 2)  (Delta:  49589 us)  (Thread: 1)
Message 0000020D001F0024001B002D  (Len: 24)  (Queue size: 3)  (Delta:    357 us)  (Thread: 0)
Message 0000000D0FFF001600310016  (Len: 24)  (Queue size: 1)  (Delta:  49896 us)  (Thread: 1)
Message 0000030D001F              (Len: 12)  (Queue size: 2)  (Delta:  49860 us)  (Thread: 0)
Message 0000020D0024001F001B002D  (Len: 24)  (Queue size: 3)  (Delta:    315 us)  (Thread: 1)
Message 0000000D0FFF00160036001B  (Len: 24)  (Queue size: 1)  (Delta:  49724 us)  (Thread: 0)
Message 0000030D001F              (Len: 12)  (Queue size: 2)  (Delta:  49891 us)  (Thread: 1)
Message 0000020D00240024001B0031  (Len: 24)  (Queue size: 3)  (Delta:    452 us)  (Thread: 0)
Message 0000000D0FFF001B00360016  (Len: 24)  (Queue size: 1)  (Delta:  49742 us)  (Thread: 1)

Essentially what's happening is that a number of 'startup' messages are sent between the application and the device when the application is first launched. After about 4 seconds I then use the application to request a steady stream of messages from the device consisting of a group of 4 messages that are sent at 50ms intervals:

  1. 0000000D...
  2. 0000010D...
  3. 0000020D...
  4. 0000030D...

It looks like the driver is actually performing really well, since we can see it consistently reporting values in the few hundreds of microseconds at the points we expect. However:

  • The 30D... message seems to be consistently arriving before the 20D... message
  • The 10D... message is consistently getting dropped

This could be a problem with the microcontroller code. I'm using double-buffered endpoints so it could make sense why messages are arriving out of order.

One thing is that I'm not explicitly waiting for ACKs on the bulk IN pipe in my microcontroller code before sending subsequent messages. That might be the issue, although I've tried doing that previously and it didn't seem to have much of an effect.

Below are screenshots of USBLyzer output if they help at all (taken during a different run so the data won't be exactly identical but still quite similar). They're screenshots just because the log files are poorly formatted.

Direct link

Direct link

EDIT 3

It seems like my microcontroller is sending messages to the driver with only around 30 microseconds between each message (at max rate), whereas based on the logs it seems like it takes somewhere between 200-500 microseconds for the driver to process one message. That's quite a discrepancy. The real issue is that something lower-level than my driver software is sending ACKs to the microcontroller at the same rate that it's sending messages even though my driver can't keep up, so I can't throttle it based on that. It seems like I might need to explicitly send a message from the driver to the microcontroller on the BULK Out pipe saying "I'm ready to receive another message" every time the driver receives a message, but that seems like it'll really slow things down. Is there any better alternative?

Tagc
  • 8,736
  • 7
  • 61
  • 114
  • 3
    Not an answer, but my suggestion to you is to have 2 read requests up (So that you have at least one request in flight at any time). I'd also suggest using an `OVERLAPPED` structure + events to handle this, that way you wait on an event, rather than a blocking I/O call. I'd also recommend allocating buffers in place and queuing those, rather than copying them. (I'd also suggest using a critical section, mutex, or semaphore instead of the `receiveMessageSema` variable you're using now.) Either way, data in BULK endpoints typically shouldn't vanish if you're too slow in reading it. – Hasturkun Aug 26 '14 at 10:15
  • @Hasturkun Thanks. Do you mean something like this: I have two `OVERLAPPED` structures. I have a thread like `messagePoller` above, except it issues two read requests, each using a different structure. Whenever one of them completes, it stores the received message in a shared queue (which the application can read from at its leisure) and re-issues the request? – Tagc Aug 26 '14 at 10:41
  • yep, that's it exactly. – Hasturkun Aug 26 '14 at 10:44
  • @Hasturkun I implemented the changes but I'm still encountering performance problems. If you're still available, I've included the changes I've made in the original post if there's anything you see there that doesn't look right. I'm also not sure if I can allocate buffers in place since I need to cast the raw data from a char array to an unsigned char array first. – Tagc Aug 26 '14 at 13:10
  • You don't really need two threads (you can use `WaitForMultipleObjects` on both in a single thread). Re: allocation, bytes are bytes, casting a signed char pointer to an unsigned one won't affect the contents. I'd also suggest using critical sections instead of mutexes (slightly cheaper, though that might not be an issue here). I'd also suggest looking at the RAW_IO pipe policy. Finally, I'd suggest using some USB sniffer/analyzer to check if the device is generating data at the rate you expect, or if it's mostly returning NAKs (i.e., no data is available when polled) – Hasturkun Aug 26 '14 at 13:34
  • @Hasturkun "Re: allocation, bytes are bytes, casting a signed char pointer to an unsigned one won't affect the contents.", right but my application expects it as a signed char and changing it to unsigned causes issues, so I'd have to do additional processing later anyway. Either way, I don't think it should cause much of a performance hit. I'm actually questioning whether the problem is with the driver at all. If you still have time to help, I've updated the original post. – Tagc Aug 26 '14 at 14:59
  • I actually meant the alternative, i.e., cast the buffer when calling `WinUsb_ReadPipe`, it really won't care. Slightly off topic, I'd generally expect the device to buffer and/or wait when bulk transfers are involved. Also, if you have control over the device and have strict timing requirements, you may want to consider using an interrupt or isochronous endpoint instead of bulk. Unfortunately, I can't currently look at the logs you posted (and I'm not sure I'd be able to add much insight) – Hasturkun Aug 26 '14 at 15:20
  • @Hasturkun "I actually meant the alternative, i.e., cast the buffer when calling WinUsb_ReadPipe." Good point! I'll try that later on if everything else gets resolved. "Slightly off topic, I'd generally expect the device to buffer and/or wait when bulk transfers are involved." That seems to be the thing, actually - I don't think the microcontroller is waiting properly. Details updated in OP. Edit: As for your other suggestions, isochronous transfers can't be used for anything lower than Windows 8.1 using WinUSB, and I don't think interrupt poll rates can be programmed faster than 1 ms. – Tagc Aug 26 '14 at 15:30
  • @Hasturkun I actually find I get the best performance with just using one thread and one event (~400-500 us between receiving messages). Using two threads with each handling one event or one thread handling two events reduces performance (~900-1000 us between receiving messages). Anyway, the USB code is working fairly well now, so if you want, you can retype your suggestions as an answer and I will mark it as accepted. – Tagc Aug 27 '14 at 11:44

0 Answers0