How does reciprocal SendMessage-ing between two applications work?

Question

Suppose I have 2 applications, A and B. Each one of them creates one window in the main thread, and has no other threads.

When the "close" button of the window of application A is pressed, the following happens:

Application A receives a WM_CLOSE message and handles it like:
```
DestroyWindow(hWnd_A);
return 0;
```

On WM_DESTROY application A behaves like:

SendMessage(hWnd_B, WM_REGISTERED_MSG, 0, 0); //key line!!
PostQuitMessage(0);
return 0;

On WM_REGISTERED_MSG application B runs:

SendMessage(hWnd_A, WM_ANOTHER_REGISTERED_MSG, 0, 0);
return 0;

On WM_ANOTHER_REGISTERED_MSG application A runs:

OutputDebugString("Cannot print this");
return 0;

And that's it.

From MSDN, I read that when a message is sent to a window created by another thread, the calling thread is blocked, and it can only process non-queued messages.

Now, since the above code works and does not hang, I guess that the call to SendMessage from application B (point 3) sends a non-queued message to application A's window procedure, which processes it in the context of application B's main thread. Indeed, no debug output is displayed with OutputDebugString in point 4.

This is also proved by the fact that replacing SendMessage with SendMessageTimeout with the SMTO_BLOCK flag in the key line of point 2, makes the whole thing actually block. (See documentation of SendMessage)

Then, my questions are:

Actually, are non-queued messages just simple direct calls to the window procedure made by SendMessage in process B?
How does SendMessage know when to send queued or non-queued messages?

UPDATE

Still, I do not understand how does A process WM_ANOTHER_REGISTERED_MSG. What I would expect is that when that message is sent, A's thread should be waiting for its call to SendMessage to return.

Any insight?

SUGGESTION FOR THE READERS

I would suggest to read Adrian's answer as an introduction to RbMm's one, which follows the same line of thought, but goes more in detail.

"*I guess that the call to `SendMessage` from application B (point 3) sends a non-queued message to application A's window procedure, which processes it **in the context of application B's main thread**.*" - that is not true at all. When `SendMessage()` sends a message to a window in another thread, `SendMessage()` blocks the calling thread waiting for the receiving thread processes the message. The message is processed in the context of the receiving thread, not the context of the sending thread. — Remy Lebeau, Jan 30 '18 at 19:41
Thank you, I will update the question, putting A's `SendMessage` before `DestroyWindow`. I still do not understand how can A process `WM_REGISTERED_MSG` in its own thread, because A's thread should be waiting for its call to 'SendMessage' to return.. shouldn't it? — Daniele, Jan 30 '18 at 19:52
See [About Messages and Message Queues](https://msdn.microsoft.com/en-us/library/windows/desktop/ms644927.aspx). `A` sends `WM_REGISTERED_MSG` and blocks until `B` processes it. `B` sends `WM_ANOTHER_REGISTERED_MSG` before exiting from `WM_REGISTERED_MSG`. `A`'s blocked `SendMessage()` receives and processes `WM_ANOTHER_REGISTERED_MSG` and then goes back to waiting for `WM_REGISTERED_MSG` to finish being processed. `PostMessage()` posts queued messages, `SendMessage()` sends non-queued messages. — Remy Lebeau, Jan 30 '18 at 20:01
Michael, please consider the updated code. Does it fix that? — Daniele, Jan 30 '18 at 20:12
This question appears to be rather theoretical. What is the use case of this? — zett42, Jan 30 '18 at 20:27
An application that controls the execution of other applications could use the above approach (or a similar one) to notify to the controlled applications to clean up and exit. — Daniele, Jan 30 '18 at 20:39
`SendMessage` always send nonqueued messages. described by you behavior is worked well. — RbMm, Jan 30 '18 at 20:42
@RbMm : Agreed. Found the bug in my test case and already deleted that comment. — Michael Gunter, Jan 30 '18 at 22:10
`//key line!!` is wrong - your first code was correct. you can call `SendMessage` exactly from `WM_DESTROY` — RbMm, Jan 31 '18 at 00:00
I guess you haven't heard of `PostMessage`. Makes the question "queued or non-queued" trivially easy -- if `PostMessage` it is queued, if `SendMessage` it is non-queued. — Ben Voigt, Jan 31 '18 at 20:17

RbMm · Accepted Answer · 2018-01-30T23:21:59.160

the described behavior really work well.

How does SendMessage know when to send queued or non-queued messages?

from Nonqueued Messages

Some functions that send nonqueued messages are ... SendMessage ...

so SendMessage simply always send nonqueued messages.

and from SendMessage documentation:

However, the sending thread will process incoming nonqueued messages while waiting for its message to be processed.

this mean that window procedure can be called inside SendMessage call. and process incoming messages sent via SendMessage from another thread. how this is implemented ?

when we call SendMessage message to another thread window, it enter to kernel mode. kernel mode always remember usermode stack pointer. and we switched to kernel stack. when we return from kernel to user mode - kernel usually return back to point, from where user mode called it and to saved stack. but exist and exceptions. one of this:

NTSYSCALLAPI
NTSTATUS
NTAPI
KeUserModeCallback
(
    IN ULONG RoutineIndex,
    IN PVOID Argument,
    IN ULONG ArgumentLength,
    OUT PVOID* Result,
    OUT PULONG ResultLenght
);

this is exported but undocumented api. however it all time used by win32k.sys for call window procedure. how this api worked ?

first of all it allocate additional kernel stack frame below current. than it take saved user mode stack pointer and copy some data (arguments) below it. finally we exit from kernel to user mode, but not to point, from where kernel was called but for special ( exported from ntdll.dll) function -

void
KiUserCallbackDispatcher
(
    IN ULONG RoutineIndex,
    IN PVOID Argument,
    IN ULONG ArgumentLength
);

and stack was below stack pointer, from where we enter kernel early. KiUserCallbackDispatcher call RtlGetCurrentPeb()->KernelCallbackTable[RoutineIndex](Argument, ArgumentLength) - usually this is some function in user32.dll. this function already call corresponded window procedure. from window procedure we can call kernel back - because KeUserModeCallback allocate additional kernel frame - we will be enter to kernel inside this frame and not damage previous. when window procedure return - again special api called

__declspec(noreturn)
NTSTATUS
NTAPI
ZwCallbackReturn
(
    IN PVOID Result OPTIONAL,
    IN ULONG ResultLength,
    IN NTSTATUS Status
);

this api (if no error) must never return - in kernel side - the allocated kernel frame is de-allocated and we return to previous kernel stack inside KeUserModeCallback. so we finally return from point, from where KeUserModeCallback was called. then we back to user mode, exactly from point where we call kernel, on same stack.

really how is window procedure is called inside call to GetMessage ? exactly by this. call flow was:

GetMessage...
--- kernel mode ---
KeUserModeCallback...
push additional kernel stack frame
--- user mode --- (stack below point from where GetMessage enter kernel)
KiUserCallbackDispatcher
WindowProc
ZwCallbackReturn
-- kernel mode --
pop kernel stack frame
...KeUserModeCallback
--- user mode ---
...GetMessage

exactly the same was with blocking SendMessage.

so when thread_A send message_1 to thread_B via SendMessage - we enter to kernel, signal gui event_B, on which thread_B waited. and begin wait on gui event_A for current thread. if thread_B executes message retrieval code (call GetMessage or PeekMessage ) KeUserModeCallback called in thread_B. as result executed it window procedure. here it call SendMessage to send some message_2 to thread_A back. as result we set event_A on which thread_A wait and begin wait on event_B. thread_A will be awaken and call KeUserModeCallback. it Window procedure will be called with this message. when it return (assume this time we not more call SendMessage) we again signal back event_B and begin wait on event_A. now thread_B return from SendMessage and then return from window procedure - finalize handle original message_1. will be event_A set. thread_A awaken and return from SendMessage. call flow will be next:

thread_A                        thread_B
----------------------------------------------------
                                GetMessage...
                                wait(event_B)
SendMessage(WM_B)...
set(event_B)
wait(event_A)
                                begin process WM_B...
                                KeUserModeCallback...
                                    KiUserCallbackDispatcher
                                    WindowProc(WM_B)...
                                    SendMessage(WM_A)...
                                    set(event_A)
                                    wait(event_B)
begin process WM_A...
KeUserModeCallback...
    KiUserCallbackDispatcher
    WindowProc(WM_A)...
    ...WindowProc(WM_A)
    ZwCallbackReturn
...KeUserModeCallback
set(event_B)
...end process WM_A
wait(event_A)
                                    ...SendMessage(WM_A)
                                    ...WindowProc(WM_B)
                                    ZwCallbackReturn
                                ...KeUserModeCallback
                                set(event_A)
                                ...end process WM_B
                                wait(event_B)
...SendMessage(WM_B)
                                ...GetMessage

also note that when we handle WM_DESTROY message - window is still valid and call process incoming messages. we can implement next demo: at first we not need 2 processes. absolute enough single process with 2 threads. and special registered message here not need. why not use say WM_APP as test message ?

thread_A from self WM_CREATE create thread_B and pass own window handle to it.
thread_B create self window, but on WM_CREATE simply return -1 (for fail create window)
thread_B from WM_DESTROY call SendMessage(hwnd_A, WM_APP, 0, hwnd_B) (pass self hwnd as lParam)
thread_A got WM_APP and call SendMessage(hwnd_B, WM_APP, 0, 0)
thread_B got WM_APP (so WindowProc was recursively called, on stack bellow WM_DESTROY
thread_B print "Cannot print this" and return self ID to thread_A
thread_A returned from call SendMessage and return self ID to thread_B
thread_B returned from call SendMessage inside WM_DESTROY

ULONG WINAPI ThreadProc(PVOID hWnd);

struct WNDCTX 
{
    HANDLE hThread;
    HWND hWndSendTo;
};

LRESULT CALLBACK WindowProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
{
    WNDCTX* ctx = reinterpret_cast<WNDCTX*>(GetWindowLongPtrW(hWnd, GWLP_USERDATA));

    switch (uMsg)
    {
    case WM_NULL:
        DestroyWindow(hWnd);
        break;
    case WM_APP:
        DbgPrint("%x:%p>WM_APP:(%p, %p)\n", GetCurrentThreadId(), _AddressOfReturnAddress(), wParam, lParam);

        if (lParam)
        {
            DbgPrint("%x:%p>Send WM_APP(0)\n", GetCurrentThreadId(), _AddressOfReturnAddress());
            LRESULT r = SendMessage((HWND)lParam, WM_APP, 0, 0);
            DbgPrint("%x:%p>SendMessage=%p\n", GetCurrentThreadId(), _AddressOfReturnAddress(), r);
            PostMessage(hWnd, WM_NULL, 0, 0);
        }
        else
        {
            DbgPrint("%x:%p>Cannot print this\n", GetCurrentThreadId(), _AddressOfReturnAddress());
        }

        return GetCurrentThreadId();

    case WM_DESTROY:

        if (HANDLE hThread = ctx->hThread)
        {
            WaitForSingleObject(hThread, INFINITE);
            CloseHandle(hThread);
        }

        if (HWND hWndSendTo = ctx->hWndSendTo)
        {
            DbgPrint("%x:%p>Send WM_APP(%p)\n", GetCurrentThreadId(), _AddressOfReturnAddress(), hWnd);
            LRESULT r = SendMessage(hWndSendTo, WM_APP, 0, (LPARAM)hWnd);
            DbgPrint("%x:%p>SendMessage=%p\n", GetCurrentThreadId(), _AddressOfReturnAddress(), r);
        }
        break;

    case WM_NCCREATE:
        SetLastError(0);

        SetWindowLongPtr(hWnd, GWLP_USERDATA, 
            reinterpret_cast<LONG_PTR>(reinterpret_cast<CREATESTRUCT*>(lParam)->lpCreateParams));

        if (GetLastError())
        {
            return 0;
        }
        break;

    case WM_CREATE:

        if (ctx->hWndSendTo)
        {
            return -1;
        }
        if (ctx->hThread = CreateThread(0, 0, ThreadProc, hWnd, 0, 0))
        {
            break;
        }
        return -1;

    case WM_NCDESTROY:
        PostQuitMessage(0);
        break;
    }

    return DefWindowProc(hWnd, uMsg, wParam, lParam);
}

static const WNDCLASS wndcls = { 
    0, WindowProc, 0, 0, (HINSTANCE)&__ImageBase, 0, 0, 0, 0, L"lpszClassName" 
};

ULONG WINAPI ThreadProc(PVOID hWndSendTo)
{
    WNDCTX ctx = { 0, (HWND)hWndSendTo };

    CreateWindowExW(0, wndcls.lpszClassName, 0, 0, 0, 0, 0, 0, HWND_MESSAGE, 0, 0, &ctx);

    return 0;
}

void DoDemo()
{
    DbgPrint("%x>test begin\n", GetCurrentThreadId());

    if (RegisterClassW(&wndcls))
    {
        WNDCTX ctx = { };

        if (CreateWindowExW(0, wndcls.lpszClassName, 0, 0, 0, 0, 0, 0, HWND_MESSAGE, 0, 0, &ctx))
        {
            MSG msg;

            while (0 < GetMessage(&msg, 0, 0, 0))
            {
                DispatchMessage(&msg);
            }
        }

        UnregisterClassW(wndcls.lpszClassName, (HINSTANCE)&__ImageBase);
    }

    DbgPrint("%x>test end\n", GetCurrentThreadId());
}

i got next output:

d94>test begin
6d8:00000008884FEFD8>Send WM_APP(0000000000191BF0)
d94:00000008880FF4F8>WM_APP:(0000000000000000, 0000000000191BF0)
d94:00000008880FF4F8>Send WM_APP(0)
6d8:00000008884FEB88>WM_APP:(0000000000000000, 0000000000000000)
6d8:00000008884FEB88>Cannot print this
d94:00000008880FF4F8>SendMessage=00000000000006D8
6d8:00000008884FEFD8>SendMessage=0000000000000D94
d94>test end

most interesting look stack trace of thread_B when it recursively called on WM_APP

Thank you very much. I find this is the most complete answer, and it fixes many holes in my understanding of the underlying behavior of Windows API. — Daniele, Jan 31 '18 at 19:58

Adrian McCarthy · Answer 2 · 2018-01-30T22:48:20.440

Still, I do not understand how does A process WM_ANOTHER_REGISTERED_MSG. What I would expect is that when that message is sent, A's thread should be waiting for its call to SendMessage to return.

The SendMessage in A is waiting for the message it sent (from A to B) to complete, but, while it's waiting, it's able to dispatch messages sent from other threads to this thread.

When SendMessage is called for a window on the same thread, we think of it like a chain of function calls that eventually leads to the target windowproc and eventually returns to the caller.

But when the message crosses thread boundaries, it's not that simple. It becomes like a client-server application. SendMessage packages up the message and signals the target thread that it has a message to process. At that point, it waits.

The target thread eventually (we hope) reaches a yield point where it checks that signal, gets the message and processes it. The target thread then signals that it's done the work.

The original thread sees the "I'm done!" signal and returns the result value. To the caller of SendMessage, it looks like it was just a function call, but it was actually choreographed to marshal the message over to the other thread and marshal the result back.

Several Windows API calls are "yield points," places that check to see if there's a message being sent to the current thread from another thread. The most well-known ones are GetMessage and PeekMessage, but certain types of waits--including the wait inside a SendMessage--are also yield points. It's this yield point that makes it possible for A to respond to the message sent back from B all while waiting for B to finish processing the first message.

Here's part of the call stack for A when it receives the WM_ANOTHER_REGISTERED_MSG back from B (step 4):

A.exe!MyWnd::OnFromB(unsigned int __formal, unsigned int __formal, long __formal, int & __formal)
A.exe!MyWnd::ProcessWindowMessage(HWND__ * hWnd, unsigned int uMsg, unsigned int wParam, long lParam, long & lResult, unsigned long dwMsgMapID)
A.exe!ATL::CWindowImplBaseT<ATL::CWindow,ATL::CWinTraits<114229248,262400> >::WindowProc(HWND__ * hWnd, unsigned int uMsg, unsigned int wParam, long lParam)
atlthunk.dll!AtlThunk_Call(unsigned int,unsigned int,unsigned int,long)
atlthunk.dll!AtlThunk_0x00(struct HWND__ *,unsigned int,unsigned int,long)
user32.dll!__InternalCallWinProc@20()
user32.dll!UserCallWinProcCheckWow()
user32.dll!DispatchClientMessage()
user32.dll!___fnDWORD@4()
ntdll.dll!_KiUserCallbackDispatcher@12()
user32.dll!SendMessageW()
A.exe!MyWnd::OnClose(unsigned int __formal, unsigned int __formal, long __formal, int & __formal)

You can see the OnClose is still inside SendMessageW, but, nested within that, it's getting the callback message from B and routing that to A's window procedure.

Thank you, Adrian. That was really helpful. – Daniele Jan 31 '18 at 19:56 — Daniele, Jan 31 '18 at 19:56

How does reciprocal SendMessage-ing between two applications work?

2 Answers2