I have a test for a pixel shader that does some rendering and compares the result to a reference image to verify that the shader produces an expected output. When this test is run on a CI machine, it is on a VM without a GPU, so I call D3D11CreateDevice with D3D_DRIVER_TYPE_REFERENCE to use the reference rasterizer. We have been doing this for years without issue on a Windows 7 VM.
We are now trying to move to a Windows 10 VM for our CI tests. When I run the test here, various API calls start failing after some number of successful tests (on the order of 5000-10000) with DXGI_ERROR_DEVICE_REMOVED, and calling GetDeviceRemovedReason returns DXGI_ERROR_DRIVER_INTERNAL_ERROR. After some debugging I've found that the failure originates during a call to ID3D11DeviceContext::PSSetShader (yes, this returns void, but I found this via a breakpoint in KernelBase.dll!RaiseException). This call looks exactly like the thousands of previous calls to PSSetShader as far as I can tell. It doesn't appear to be a resource issue, the process is only using 8MB of memory when the error occurs, and the handle count is not growing.
I can reproduce the issue on multiple Win10 systems, and it succeeds on multiple Win7 systems. The big difference between the two is that on Win7, the API calls are going through d3d11ref.dll, and on Win10 they are going through d3d10warp.dll. I am not really familiar with what the differences are or why one or the other would be chosen, and MSDN's documentation is quite opaque on the subject. I know that both d3d11ref.dll and d3d10warp.dll are both present on both failing and passing systems; I don't know what the logic is for one or the other being loaded for the same set of calls, or why the d3d10warp library fails.
So, can someone explain the difference between the two, and/or suggest how I could get d3d11ref.dll to load in Windows 10? As far as I can tell it is a bug in d3d10warp.dll and for now I would just like to side-step it.
In case it matters, I am calling D3D11CreateDevice with the desired feature level set to D3D_FEATURE_LEVEL_11_0, and I verify that the same level is returned as acheived. I am passing 0 for creationFlags, and my D3D11_SDK_VERSION is defined as 7 in d3d11.h. Below is the call stack above PSSetShader when the failure occurs. This seems to be the first call that fails, and every call after it with a return code also fails.
KernelBase.dll!RaiseException()
KernelBase.dll!OutputDebugStringA()
d3d11.dll!CDevice::RemoveDevice(long)
d3d11.dll!NDXGI::CDevice::RemoveDevice()
d3d11.dll!CContext::UMSetError_()
d3d10warp.dll!UMDevice::MSCB_SetError(long,enum UMDevice::DDI_TYPE)
d3d10warp.dll!UMContext::SetShaderWithInterfaces(enum PIXELJIT_SHADER_STAGE,struct D3D10DDI_HSHADER,unsigned int,unsigned int const *,struct D3D11DDIARG_POINTERDATA const *)
d3d10warp.dll!UMDevice::PsSetShaderWithInterfaces(struct D3D10DDI_HDEVICE,struct D3D10DDI_HSHADER,unsigned int,unsigned int const *,struct D3D11DDIARG_POINTERDATA const *)
d3d11.dll!CContext::TID3D11DeviceContext_SetShaderWithInterfaces_<1,4>(class CContext *,struct ID3D11PixelShader *,struct ID3D11ClassInstance * const *,unsigned int)
d3d11.dll!CContext::TID3D11DeviceContext_SetShader_<1,4>()
MyTest.exe!MyFunctionThatCallsPSSetShader()
Update: With the D3D Debug layers enabled, I get the following additional output when the error occurs:
D3D11: Removing Device.
D3D11 WARNING: ID3D11Device::RemoveDevice: Device removal has been triggered for the following reason (DXGI_ERROR_DRIVER_INTERNAL_ERROR: There is strong evidence that the driver has performed an undefined operation; but it may be because the application performed an illegal or undefined operation to begin with.). [ EXECUTION WARNING #379: DEVICE_REMOVAL_PROCESS_POSSIBLY_AT_FAULT]
D3D11 ERROR: ID3D11DeviceContext::Map: Returning DXGI_ERROR_DEVICE_REMOVED, when a Resource was trying to be mapped with READ or READWRITE. [ RESOURCE_MANIPULATION ERROR #2097214: RESOURCE_MAP_DEVICEREMOVED_RETURN]
The third line about the call to Map happens after my test fails to notice and handle the device removed and later tries to map a texture, so I don't think that's related. The other is about what I expected; there's an error in the driver, and possibly my test is doing something bad to cause it. I still don't know what that might be, or why it worked in Windows 7.
Update 2: I have found that if I run my tests on Windows 10 in Windows 7 compatibility mode, there is no device removed error and all of my tests pass. It is still using d3d10warp.dll instead of d3d11ref.dll, so that wasn't exactly the problem. I'm not sure how to investigate "what am I doing that's incompatible with Windows 10 or its WARP device"; this might need to be a Microsoft support ticket.