I'm trying to write code to make it so that a microcontroller can communicate over USB with a PC. I think that the majority of the code is in place, and communication generally works fine so long as it's slow (> ~5ms per message). However, I'd like to make it work with a 0% drop rate at much higher speeds (preferably at a rate of a few dozen to a few hundred microseconds between each message). Communication is configured to use full-speed USB 2.0 protocol, so I have a transmission rate of around 12 MHz.
I'm finding it harder to debug issues with USB relative to (say) CAN because of how the protocol works, but I believe the issue is with the PC-side driver that I wrote to interface with the device.
The device and driver use the two standard control endpoints (physical endpoints 0 and 1) and additionally use two bulk endpoints:
- BULK 2 Out (physical endpoint 4)
- BULK 2 In (physical endpoint 5)
Both of these have a maximum packet size of 64 bytes.
The driver I created was more-or-less based on the example code on Microsoft's site. I exported a receiveMessage
function and, originally, every time this was called it allocated a 64-byte buffer and requested 64 bytes of data from the BULK 2 In pipe using WinUsb_ReadPipe
. There's a 1ms timeout to prevent the application from hanging if there's no data to read.
This essentially means that the rate at which data is read from the pipe is limited to the rate that the application can poll it. If the device is writing data to the pipe faster than the application can read from it, I'm sure that's going to cause problems. I tried to solve the issue by creating a queue and a thread within the driver that does nothing but continuously poll the pipe and store any received messages in the queue, which the application can then read from at leisure using `receiveMessage'. This didn't work very well, though.
What I'd like is a code sample that demonstrates a way to ensure the following:
- Data can be read from the BULK In pipe as fast as possible.
- Has a way to 'buffer' messages so that the device can write to the pipe faster than the application can process them for short durations, but no messages are dropped.
I think that one possible approach might be to set up another bulk pipeline between the device and the driver. The device should then store all outgoing messages into a buffer and also keep a separate array containing the number of bytes of each of the messages at the corresponding index within this buffer. Then whenever the application wants to read a set of messages, the driver uses one of these pipelines to request the message byte array from the device. For each value in this array, the driver issues a request for that number of bytes using WinUsb_ReadPipe
, which the device services by sending each message across the USB bus in the same order. I'm not sure if this would work or if it's overly complicated, though.
My current initialisation code:
_declspec(dllexport) int initialiseC()
{
HRESULT hr;
BOOL bResult;
BOOL noDevice;
ULONG lengthReceived;
ULONG cbSize;
cbSize = 0;
//OutputDebugString((LPCWSTR)"Beginning of initialisation\n");
//
// Find a device connected to the system that has WinUSB installed using our
// INF
//
hr = OpenDevice(&deviceData, &noDevice);
if (FAILED(hr)) {
if (noDevice) {
printf("Device not connected or driver not installed\n");
MessageBoxA(NULL, "Device not connected or driver not installed", "Debug", MB_OK);
}
else {
printf("Failed looking for device, HRESULT 0x%x\n", hr);
MessageBoxA(NULL, "Failed looking for device", "Debug", MB_OK);
}
getchar();
return 1;
}
//
// Get device descriptor
//
bResult = WinUsb_GetDescriptor(deviceData.WinusbHandle,
USB_DEVICE_DESCRIPTOR_TYPE,
0,
0,
(PBYTE)&deviceDesc,
sizeof(deviceDesc),
&lengthReceived);
if (FALSE == bResult || lengthReceived != sizeof(deviceDesc)) {
printf("Error among LastError %d or lengthReceived %d\n",
FALSE == bResult ? GetLastError() : 0,
lengthReceived);
MessageBoxA(NULL, "Initialisation error", "Debug", MB_OK);
CloseDevice(&deviceData);
getchar();
return 2;
}
//
// Print a few parts of the device descriptor
//
printf("Device found: VID_%04X&PID_%04X; bcdUsb %04X\n",
deviceDesc.idVendor,
deviceDesc.idProduct,
deviceDesc.bcdUSB);
// Retrieve pipe information.
bResult = QueryDeviceEndpoints(deviceData.WinusbHandle, &pipeID);
if (!bResult)
{
printf("Error querying device endpoints\n");
MessageBoxA(NULL, "Error querying device endpoints", "Debug", MB_OK);
CloseDevice(&deviceData);
getchar();
return 3;
}
// Set timeout for read requests.
ULONG timeout = 1; // 1 ms.
WinUsb_SetPipePolicy(deviceData.WinusbHandle, pipeID.PipeInId,
PIPE_TRANSFER_TIMEOUT, sizeof(timeout), &timeout);
// Create message polling thread.
messagePollerHandle = CreateThread(NULL, 0, messagePoller, NULL, 0, NULL);
if (messagePollerHandle == NULL)
{
printf("Error creating message poller thread\n");
MessageBoxA(NULL, "Error creating message poller thread", "Debug", MB_OK);
CloseDevice(&deviceData);
getchar();
return 4;
}
initStatus = 1;
return 0;
}
I internally poll for messages within the driver with this code:
DWORD WINAPI messagePoller(LPVOID lpParam)
{
BOOL bResult;
ULONG cbReceived = 0;
USB_TYPE_T usbMessageIn;
while (initStatus)
{
UCHAR* receiveBuffer = (UCHAR*)LocalAlloc(LPTR, MAX_PACKET_SIZE*1000);
bResult = ReadFromBulkEndpoint(deviceData.WinusbHandle, &pipeID.PipeInId, MAX_PACKET_SIZE*1000, &cbReceived, receiveBuffer);
if (!bResult)
{
printf("Error reading data from endpoint\n");
//MessageBoxA(NULL, "Error reading data from endpoint", "Debug", MB_OK);
getchar();
CloseDevice(&deviceData);
usbMessageIn.len = 0;
return 1;
}
if (cbReceived == 0)
{
LocalFree(receiveBuffer);
continue;
}
const char* input = reinterpret_cast<const char*>(receiveBuffer);
strcpy_s(usbMessageIn.string, input);
usbMessageIn.len = strlen(input);
while (receiveMessageSema);
receiveMessageSema = TRUE;
while (receiveQueue.size() >= RECEIVE_QUEUE_MAX_SIZE) receiveQueue.pop_front();
receiveQueue.push_back(usbMessageIn);
receiveMessageSema = FALSE;
LocalFree(receiveBuffer);
}
return 0;
}
The exported receiveMessage
code that the application using the driver can use to receive messages is given here:
_declspec(dllexport) void receiveUSBMessageC(USB_TYPE_T *usbMessageIn)
{
if (receiveMessageSema || receiveQueue.empty())
{
usbMessageIn->len = 0;
strcpy_s(usbMessageIn->string, "");
}
else
{
receiveMessageSema = TRUE;
*usbMessageIn = receiveQueue.front();
receiveQueue.pop_front();
receiveMessageSema = FALSE;
}
}
EDIT
Based on Hasturkun's suggestions, I made the following changes to the code.
Within initialisation:
...
readEventHandle[0] = CreateEvent(NULL, FALSE, TRUE, TEXT("ReadEvent0"));
readEventHandle[1] = CreateEvent(NULL, FALSE, TRUE, TEXT("ReadEvent1"));
if (!readEventHandle[0] || !readEventHandle[1])
{
printf("readEvent creation error\n");
MessageBoxA(NULL, "readEvent creation error", "Debug", MB_OK);
CloseDevice(&deviceData);
getchar();
return 5;
}
readPipeHandle[0] = CreateThread(NULL, 0, readPipe, &readPipeParam0, 0, NULL);
readPipeHandle[1] = CreateThread(NULL, 0, readPipe, &readPipeParam1, 0, NULL);
if (!readPipeHandle[0] || !readPipeHandle[1])
{
printf("readPipe creation error\n");
MessageBoxA(NULL, "readPipe creation error", "Debug", MB_OK);
CloseDevice(&deviceData);
getchar();
return 6;
}
readOverlapped[0].hEvent = readEventHandle[0];
readOverlapped[1].hEvent = readEventHandle[1];
...
Read pipe thread:
DWORD WINAPI readPipe(LPVOID lpParam)
{
BOOL bResult;
ULONG cbReceived = 0;
USB_TYPE_T usbMessageIn;
DWORD waitResult;
uint8_t index = *static_cast<uint8_t*>(lpParam);
BOOLEAN init = FALSE;
UCHAR receiveBuffer[MAX_PACKET_SIZE] = { 0 };
LARGE_INTEGER start[2], end[2], freq;
QueryPerformanceFrequency(&freq);
QueryPerformanceCounter(&start[index]);
uint32_t microDelta;
while (initStatus)
{
waitResult = WaitForSingleObject(readEventHandle[index], INFINITE);
switch (waitResult)
{
case WAIT_OBJECT_0:
if (init)
{
WinUsb_GetOverlappedResult(deviceData.WinusbHandle, &readOverlapped[index], &cbReceived, FALSE);
if (cbReceived > 0)
{
// Read data, set usbMessageIn and add to queue.
strcpy_s(usbMessageIn.string, reinterpret_cast<const char*>(receiveBuffer));
usbMessageIn.len = cbReceived;
mtx.lock();
while (receiveQueue.size() >= RECEIVE_QUEUE_MAX_SIZE)
{
MessageBoxA(NULL, "Message receive queue full", "Debug", MB_OK);
receiveQueue.pop_front();
}
receiveQueue.push_back(usbMessageIn);
#ifdef CREATE_DEBUG_LOG
QueryPerformanceCounter(&end[index]);
microDelta = (end[index].QuadPart - start[index].QuadPart) * 1000000 / freq.QuadPart;
std::string str = usbMessageIn.string;
while (str.length() < 24) str += " ";
fprintf(logFile, "Message %s (Len: %d) (Queue size: %d) (Delta: %6d us) (Thread: %d)\n",
str.c_str(), cbReceived, receiveQueue.size(), microDelta, index);
QueryPerformanceCounter(&start[index]);
#endif
mtx.unlock();
}
}
else
{
init = TRUE;
}
// Create another read request.
std::fill(receiveBuffer, receiveBuffer + sizeof(receiveBuffer), 0);
bResult = ReadFromBulkEndpoint(deviceData.WinusbHandle, &pipeID.PipeInId, MAX_PACKET_SIZE,
NULL, receiveBuffer, &readOverlapped[index]);
if (!bResult && GetLastError() != ERROR_IO_PENDING)
{
printf("Error reading data from endpoint\n");
MessageBoxA(NULL, "Error reading data from endpoint", "Debug", MB_OK);
getchar();
CloseDevice(&deviceData);
usbMessageIn.len = 0;
return 1;
}
break;
default:
MessageBoxA(NULL, "Error handling read event", "Debug", MB_OK);
break;
}
}
return 0;
}
I also commented out everything to do with messagePoller
, since readPipe
handles all of that now.
However, I'm still encountering performance issues.
EDIT 2
Updated readPipe
code above.
I'm seriously starting to wonder whether it is my driver that's the issue or the microcontroller. With a protocol like CAN it would be much easier to tell where the problem lies...
I got my driver to produce a log of all the messages it received as it received them, including additional details such as the time delta between a thread receiving two messages, and which thread is processing the message (I have two). I get an output like this:
Message 000000050FFF00640064 (Len: 20) (Queue size: 1) (Delta: 573120 us) (Thread: 0)
Message 000000070000323232323232 (Len: 24) (Queue size: 1) (Delta: 593050 us) (Thread: 1)
Message 000000070100323232323232 (Len: 24) (Queue size: 1) (Delta: 39917 us) (Thread: 0)
Message 000000090000 (Len: 12) (Queue size: 1) (Delta: 39950 us) (Thread: 1)
Message 0000000B0000 (Len: 12) (Queue size: 1) (Delta: 59842 us) (Thread: 0)
Message 0000000D0FFF001B003A001B (Len: 24) (Queue size: 1) (Delta: 59979 us) (Thread: 1)
Message 0000030D001F (Len: 12) (Queue size: 2) (Delta: 20207 us) (Thread: 0)
Message 0000020D00280024001B002D (Len: 24) (Queue size: 3) (Delta: 227 us) (Thread: 1)
Message 0000000F000F000000000000 (Len: 24) (Queue size: 1) (Delta: 39890 us) (Thread: 0)
Message 0000010F0000 (Len: 12) (Queue size: 2) (Delta: 39902 us) (Thread: 1)
Message 0000001100FF001D001D0020 (Len: 24) (Queue size: 1) (Delta: 19827 us) (Thread: 0)
Message 000001110FFF0020001E001E (Len: 24) (Queue size: 2) (Delta: 19824 us) (Thread: 1)
Message 00000211001E (Len: 12) (Queue size: 3) (Delta: 224 us) (Thread: 0)
Message 0000001300 (Len: 10) (Queue size: 1) (Delta: 19996 us) (Thread: 1)
Message 0000001D4000 (Len: 12) (Queue size: 1) (Delta: 63864 us) (Thread: 0)
Message 0000000D0FFF001600310016 (Len: 24) (Queue size: 1) (Delta: 4025107 us) (Thread: 1)
Message 0000030D001F (Len: 12) (Queue size: 2) (Delta: 3981220 us) (Thread: 0)
Message 0000020D002800240016002D (Len: 24) (Queue size: 3) (Delta: 326 us) (Thread: 1)
Message 0000000D0FFF001600310016 (Len: 24) (Queue size: 1) (Delta: 58885 us) (Thread: 0)
Message 0000030D0024 (Len: 12) (Queue size: 2) (Delta: 58852 us) (Thread: 1)
Message 0000020D0024001F001F0031 (Len: 24) (Queue size: 3) (Delta: 310 us) (Thread: 0)
Message 0000000D0FFF001B0036001B (Len: 24) (Queue size: 1) (Delta: 49755 us) (Thread: 1)
Message 0000030D0024 (Len: 12) (Queue size: 2) (Delta: 49886 us) (Thread: 0)
Message 0000020D00240024001B0036 (Len: 24) (Queue size: 3) (Delta: 447 us) (Thread: 1)
Message 0000000D0FFF001600360016 (Len: 24) (Queue size: 1) (Delta: 49703 us) (Thread: 0)
Message 0000030D001F (Len: 12) (Queue size: 2) (Delta: 49589 us) (Thread: 1)
Message 0000020D001F0024001B002D (Len: 24) (Queue size: 3) (Delta: 357 us) (Thread: 0)
Message 0000000D0FFF001600310016 (Len: 24) (Queue size: 1) (Delta: 49896 us) (Thread: 1)
Message 0000030D001F (Len: 12) (Queue size: 2) (Delta: 49860 us) (Thread: 0)
Message 0000020D0024001F001B002D (Len: 24) (Queue size: 3) (Delta: 315 us) (Thread: 1)
Message 0000000D0FFF00160036001B (Len: 24) (Queue size: 1) (Delta: 49724 us) (Thread: 0)
Message 0000030D001F (Len: 12) (Queue size: 2) (Delta: 49891 us) (Thread: 1)
Message 0000020D00240024001B0031 (Len: 24) (Queue size: 3) (Delta: 452 us) (Thread: 0)
Message 0000000D0FFF001B00360016 (Len: 24) (Queue size: 1) (Delta: 49742 us) (Thread: 1)
Essentially what's happening is that a number of 'startup' messages are sent between the application and the device when the application is first launched. After about 4 seconds I then use the application to request a steady stream of messages from the device consisting of a group of 4 messages that are sent at 50ms intervals:
- 0000000D...
- 0000010D...
- 0000020D...
- 0000030D...
It looks like the driver is actually performing really well, since we can see it consistently reporting values in the few hundreds of microseconds at the points we expect. However:
- The 30D... message seems to be consistently arriving before the 20D... message
- The 10D... message is consistently getting dropped
This could be a problem with the microcontroller code. I'm using double-buffered endpoints so it could make sense why messages are arriving out of order.
One thing is that I'm not explicitly waiting for ACKs on the bulk IN pipe in my microcontroller code before sending subsequent messages. That might be the issue, although I've tried doing that previously and it didn't seem to have much of an effect.
Below are screenshots of USBLyzer output if they help at all (taken during a different run so the data won't be exactly identical but still quite similar). They're screenshots just because the log files are poorly formatted.
EDIT 3
It seems like my microcontroller is sending messages to the driver with only around 30 microseconds between each message (at max rate), whereas based on the logs it seems like it takes somewhere between 200-500 microseconds for the driver to process one message. That's quite a discrepancy. The real issue is that something lower-level than my driver software is sending ACKs to the microcontroller at the same rate that it's sending messages even though my driver can't keep up, so I can't throttle it based on that. It seems like I might need to explicitly send a message from the driver to the microcontroller on the BULK Out pipe saying "I'm ready to receive another message" every time the driver receives a message, but that seems like it'll really slow things down. Is there any better alternative?