user32!CreateWindowExW
eventually calls user32!NtUserCreateWindowEx
, which is a win32k syscall. It ends up calling win32k!xxxCreateWindowEx
, which allocates a new window (tagWND
) on the user heap using win32k!HmAllocObject
. This returns a handle integer into the global user handle table managed by win32k.
win32k!xxxCreateWindowEx
then calls win32k!DwmAsyncChildCreate
:

win32k!DwmAsyncChildCreate
constructs a LPC _PORT_MESSAGE
message and inserts the parameters of the call into the message. Object
is the DWM ALPC port, a2
is the window handle hWND
, a3
is the thread desktop window handle.

LPC is now implemented internally using ALPC, and can be synchronous or asynchronous, but in this case, the message is sent asynchronously.
dwm.exe has a port thread that waits on the port. The thread continually runs in CPortBase::PortThreadInternal
, calling NtReplyWaitReceivePort
on the ALPC port it created, and when it receives a message, it calls CPortBase::RoutePacket
.
Several messages are sent to DWM, including when the window is shown or hidden.
Seeing as DWM has access to the hWND, it can get the device context (((PDCE)Wnd->pcls->pdce)->hDC
), and then the surface from the _DC
structure (DC->dclevel.pSurface
). On a virtualbox win7 VM, it appears to be an engine managed surface because very few calls are made to cdd.dll, and on ShowWindowEx
, win32k!hsurfCreateCompatibleSurface
calls cdd!DrvCreateDeviceBitmapEx
, which returns unimplemented, so then win32k!hsurfCreateCompatibleSurface
instead calls win32k!SURFACE::bCreateDIB
to create the surface. In this call, the surface object is inserted into GDI handle table (though it is at a kernel address, the kernel address range is mapped to different physical pages for each process) using HmgInsertObject
. The handle HBITMAP
appears to be a handle to the _BASEOBJECT
of the SURFACE
object, and the SURFACE
object contains an embedded _SURFOBJ
, which points to the first scan line of the bitmap via pvscan0
. HSURF
and HBITMAP
appear to be the same.
The _DXGK_DRIVERCAPS
structure returned by dxgkrnl!DxgkCddGetDriverCaps
is the same as the one on VboxMPWddm.cpp on github, so clearly it is returning the _DXGK_DRIVERCAPS
of the WDDM display miniport driver:

It can be seen that at offset 0x34, the _DXGK_PRESENTATIONCAPS
contains 0x3, meaning SupportKernelModeCommandBuffer = 0
and therefore GDI is not accelerated.
It does appear to still pass some commands to dxgkrnl which then gets sent to the WDDM miniport, such as changing the pointer shape:

I'm not sure what happens when you have 2 screens and 2 display adapters and one supports GDI acceleration and the other doesn't.
I net debugged a windows 10 machine with an IGPU and something very different happened when creating a surface. This time cdd!DrvCreateDeviceBitmapEx
creates a device surface:

dxgkrnl!DXGDEVICE::CreateStandardAllocation
eventually results in calls to the display miniport driver.
The _DXGK_DRIVERCAPS
this time is different, and SupportKernelModeCommandBuffer = 1
:

Whereas on the VM the surface is in system memory, the device surface (pointed to by pvscan0
of _SURFOBJ
) is created in an aperture memory segment memory address accessible by the GPU. The surface is copied to VRAM and copied back again to the aperture memory. Whereas the CPU accesses the memory directly, the GPU accesses the system memory through the aperture.
cdd.dll appears to be a XPDM display miniport driver replacement for framebuf.dll that does not pass any of it to the XPDM GPU driver (video port, miniport pair; which used to be videoprt.sys, vgapnp.sys). Instead it passes some GDI stuff to the WDDM display miniport driver and the stuff that can't be accelerated to the GDI engine (the XPDM display driver).
DWM can then read this surface and can then convert it to a DirectX surface, composite, and then use the WDDM stack to display it to the screen.
I'm not sure about DirectX windows. I don't know much about DirectX workings, but I will update.