0

I have a multi-threaded dll for a third-party application. My dll invokes messages onto the main UI thread by calling SendMessage with a custom message type:

typedef void (*CallbackFunctionType)();
DWORD _wm;
HANDLE _hwnd;
DWORD threadId;

Initialize()
{
    _wm = RegisterWindowMessage("MyInvokeMessage");
    WNDCLASS wndclass = {0};
    wndclass.hInstance = (HINSTANCE)&__ImageBase;
    wndclass.lpfnWndProc = wndProcedure;
    wndclass.lpszClassName = "MessageOnlyWindow";
    RegisterClass(&wndclass);
    _hwnd = CreateWindow(
         "MessageOnlyWindow",
         NULL,
         NULL,
         CW_USEDEFAULT,
         CW_USEDEFAULT,
         CW_USEDEFAULT,
         CW_USEDEFAULT,
         NULL,
         NULL,
         (HINSTANCE)&__ImageBase,
         NULL);
    threadId = GetCurrentThreadId();
}

void InvokeSync(CallbackFunctionType funcPtr)
{
    if (_hwnd != NULL && threadId != GetCurrentThreadId())
        SendMessage(_hwnd, _wm, 0, (LPARAM)funcPtr);
    else
        funcPtr();
}
static LRESULT CALLBACK wndProcedure(
    HWND hWnd, UINT Msg, WPARAM wParam, LPARAM lParam)
{
    if (Msg == _wm)
    {
        CallbackFunctionType funcPtr = (CallbackFunctionType)lParam;
        (*funcPtr)();
    }
    return DefWindowProc(hWnd, Msg, wParam, lParam);
}

The application is MDI, and I'm performing open document/extract contents/process in background/save on a bunch of documents, so it is constantly switching active documents and opening and closing new ones.

My issue is that sometimes the processing gets stuck when it's trying to invoke messages onto the main thread, using the above InvokeSync() function.

When I pause it in a debugger, I see the main thread has this call stack:

user32.dll!_NtUserGetMessage@16() + 0x15 bytes
user32.dll!_NtUserGetMessage@16() + 0x15 bytes
mfc42.dll!CWinThread::PumpMessage() + 0x16 bytes
// the rest is normal application stuff

And the background thread that's locked up has a call stack like this:

user32.dll!_NtUserMessageCall@28() + 0x15 bytes
user32.dll!_NtUserMessageCall@28() + 0x15 bytes
mydll!InvokeSync(funcPtr)
// the rest is expected dll stuff

So it appears to be getting stuck on the "SendMessage()" call, but as far as I can see, the message pump on the main thread is sitting there idle.

However, if I manually click on an inactive document (to make it active), somehow this wakes everything up, and the SendMessage() event finally goes through, and it resumes processing.

The main application uses Microsoft Fibers, 1 fiber per document. Could my SendMessage be getting stuck in a background fiber that gets switched out or something? on a fiber right before it goes inactive or something, and only by forcing a context switch does that fiber ever get around to handling its messages? I really don't understand how threads and fibers interact with each other, so I'm kind of grasping at straws.

What could cause messages to sit there unhandled like this? More importantly, is there a way to prevent this situation from occurring? Or at least, how do I debug such a situation?

Bryce Wagner
  • 1,151
  • 7
  • 18
  • Most likely you are sending the message to a window that is owned by a thread without a message pump – David Heffernan Jun 23 '15 at 15:34
  • No, I posted the call stack of the thread that owns the HWND, it's sitting there idle waiting on _NtUserGetMessage(). Probably 9999/10000 messages get processed correctly, it's just that it occasionally one of them sits there hanging and forgotten, waiting to be processed. – Bryce Wagner Jun 23 '15 at 15:39
  • Based on your description, the message never makes it to the queue. Sometimes this can be a misleading symptom (meaning something else is going on which was not reported properly). Does your output window display any notable items? – Jeff Jun 23 '15 at 15:43
  • Just a bunch of messages "The thread 'Win32 Thread' (0x???) has exited with code 0 (0x0)." – Bryce Wagner Jun 23 '15 at 15:57
  • 1
    SendMessage() is [dangerous](http://stackoverflow.com/a/29603742/17034) and liable to cause deadlock. Use PostMessage() instead. – Hans Passant Jun 23 '15 at 16:07
  • I don't understand the opening sentences of the description. Are you injecting a DLL into third-party code? In what sense is your DLL multi-threaded? If this is a third-party app, how do you know it uses fibers? Are you creating your message-only window on the main UI thread or on a random thread? – Adrian McCarthy Jun 23 '15 at 16:24
  • Remember that SendMessage doesn't go through the message pump. Two messages sent at the same time (via SendMessage) from different threads have to be serialized. You're probably deadlocking here. – Adrian McCarthy Jun 23 '15 at 16:27
  • The dll is loaded as a plugin by the third party app, and the creators of this application have well documented the fact that it uses fibers internally, with one fiber per document, but not a lot of information on the implications of that. The dll is multithreaded because it goes off and does a bunch of processing in multiple threads, but all interactions with the main application need to be invoked back onto the main application thread. – Bryce Wagner Jun 23 '15 at 16:27
  • @AdrianMcCarthy There is only one sending thread at the time of the hang, and the receiving (main) thread is sitting idle at _NtUserGetMessage(), and that main thread is still processing other messages (keyboard, mouse, etc), just not the "SendMessage()" from the background thread. – Bryce Wagner Jun 23 '15 at 16:30
  • @HansPassant The main reason I used SendMessage was because it's "guaranteed" delivery, unlike PostMessage, where messages can be discarded if they overflow the queue. So that means I have to handle all my own synchronization. But at this point, PostMessage with retry and timeout might be the only chance of it working. Just really messy. – Bryce Wagner Jun 23 '15 at 16:35
  • The queue is 10000 long. How fast are you send/posting messages? – Martin James Jun 23 '15 at 18:45
  • Also, well before the queue fills up, you will find out that you GUI is pretty well dead to user interaction - won't move, resize, timers don't fire, KB/mouse input is not processed etc. Still, almost any scheme is better than SendMessage(). – Martin James Jun 23 '15 at 18:54
  • @MartinJames I would probably never have more than a dozen or so of my own messages at the most extreme. But the fact that PostMessage doesn't guarantee delivery means I have to implement a retry/timeout mechanism or there's the potential to deadlock the whole thing in a way that's my fault. – Bryce Wagner Jun 23 '15 at 19:07
  • You still haven't clarified the problem. Are you injecting a DLL into third-party code? In what sense is your DLL multi-threaded? If this is a third-party app, how do you know it uses fibers? – Adrian McCarthy Jun 23 '15 at 23:16
  • @BryceWagner if 'a dozen or so' is all, you can assume that PostMessage guarantees delivery. Once successfully queued, PostMessage does not lose messages, ever. It would be absolutely disastrous for Windows if it ever did. – Martin James Jun 24 '15 at 01:22
  • @AdrianMcCarthy No injection, the application has a plugin API, and it loads my DLL and calls an entry point. In this entry point, I create a Window for invoking messages on and register some callbacks. When it calls one of those callbacks, I spawn a new thread for my code and immediately return control to the application. Then when my code wants to talk to the application, it uses the Windows message queue to synchronously invoke callbacks onto the main thread. – Bryce Wagner Jun 24 '15 at 14:15
  • @MartinJames SendMessage shouldn't be losing messages like that either... But using PostMessage and doing my own synchronization seems to have fixed my problem. – Bryce Wagner Jun 24 '15 at 14:17

2 Answers2

0

I went ahead and implemented my own message queue, and a message format which uses a semaphore to notify when a message has been received, and another when it has been completed, and then repeat PostMessage every 1 second until the "message received" gets signalled, then wait for the "message complete" with infinite timeout.

Any extra PostMessages are ignored, because they no longer contain a payload to execute, they just tell the main thread to check the queue for incoming events.

Since I made these changes, I have not run into the situation again. The best I can tell, the sent message must be ending up on the queue of a switched out fiber, and forgotten until that fiber is switched in again. By reposting the message, it can just keep retrying until the active fiber notices the message sitting there.

Bryce Wagner
  • 1,151
  • 7
  • 18
0

Check the arguments to GetMessage. The third and 4th are a message ID range. Your message will happily sit in the queue if its ID is outside this range.

MSalters
  • 173,980
  • 10
  • 155
  • 350
  • Good advice, but the GetMessage is inside mfc42.dll PumpMessage(), and it does get delivered after a fiber context switch, so I think range filtering is unlikely to be the cause. – Bryce Wagner Jun 24 '15 at 14:33