1

I've been struggling with a resource leak seemingly caused by NVIDIA's h.264 encoder MFT. Each time a frame is submitted to the encoder, the reference count of my D3D device is incremented by 1, and this reference is not given up even after shutting down the MFT. A bunch of threads are leaked as well.

I'm almost ready to bring this up with NVIDIA, but I'd like to first make sure there's nothing obvious I have missed. Please see my implementation below - I've tried to keep it as concise and clear as possible.

Arguments for why this might be a problem with NVIDIA's encoder:

  • This only happens with NVIDIA's encoder. No leak is observed when running on e.g. Intel's QuickSync.

Arguments for why this might be a problem in my code:

  • I've tried using a SinkWriter to write DXGI surfaces to a file in a similar fashion, and here the leak is not present. Unfortunately I don't have access to the source code of SinkWriter. I would be very happy if anyone could point me to some working sample code that I could compare against.
#pragma comment(lib, "D3D11.lib")
#pragma comment(lib, "mfplat.lib")
#pragma comment(lib, "mf.lib")
#pragma comment(lib, "evr.lib")
#pragma comment(lib, "mfuuid.lib")
#pragma comment(lib, "Winmm.lib")

// std
#include <iostream>
#include <string>

// Windows
#include <windows.h>
#include <atlbase.h>

// DirectX
#include <d3d11.h>

// Media Foundation
#include <mfapi.h>
#include <mfplay.h>
#include <mfreadwrite.h>
#include <mferror.h>
#include <Codecapi.h>

// Error handling
#define CHECK(x) if (!(x)) { printf("%s(%d) %s was false\n", __FILE__, __LINE__, #x); throw std::exception(); }
#define CHECK_HR(x) { HRESULT hr_ = (x); if (FAILED(hr_)) { printf("%s(%d) %s failed with 0x%x\n", __FILE__, __LINE__, #x, hr_); throw std::exception(); } }

// Constants
constexpr UINT ENCODE_WIDTH = 1920;
constexpr UINT ENCODE_HEIGHT = 1080;
constexpr UINT ENCODE_FRAMES = 120;

void runEncode();

int main()
{
    CHECK_HR(CoInitializeEx(NULL, COINIT_APARTMENTTHREADED));
    CHECK_HR(MFStartup(MF_VERSION));

    for (;;)
    {
        runEncode();
        if (getchar() == 'q')
            break;
    }

    CHECK_HR(MFShutdown());

    return 0;
}

void runEncode()
{
    CComPtr<ID3D11Device> device;
    CComPtr<ID3D11DeviceContext> context;
    CComPtr<IMFDXGIDeviceManager> deviceManager;

    CComPtr<IMFVideoSampleAllocatorEx> allocator;
    CComPtr<IMFTransform> transform;
    CComPtr<IMFAttributes> transformAttrs;
    CComQIPtr<IMFMediaEventGenerator> eventGen;
    DWORD inputStreamID;
    DWORD outputStreamID;


    // ------------------------------------------------------------------------
    // Initialize D3D11
    // ------------------------------------------------------------------------

    CHECK_HR(D3D11CreateDevice(NULL, D3D_DRIVER_TYPE_HARDWARE, NULL, D3D11_CREATE_DEVICE_VIDEO_SUPPORT | D3D11_CREATE_DEVICE_DEBUG, NULL, 0, D3D11_SDK_VERSION, &device, NULL, &context));

    {
        // Probably not necessary in this application, but maybe the MFT requires it?
        CComQIPtr<ID3D10Multithread> mt(device);
        CHECK(mt);
        mt->SetMultithreadProtected(TRUE);
    }

    // Create device manager
    UINT resetToken;
    CHECK_HR(MFCreateDXGIDeviceManager(&resetToken, &deviceManager));
    CHECK_HR(deviceManager->ResetDevice(device, resetToken));


    // ------------------------------------------------------------------------
    // Initialize hardware encoder MFT
    // ------------------------------------------------------------------------

    {
        // Find the encoder
        CComHeapPtr<IMFActivate*> activateRaw;
        UINT32 activateCount = 0;

        // Input & output types
        MFT_REGISTER_TYPE_INFO inInfo = { MFMediaType_Video, MFVideoFormat_NV12 };
        MFT_REGISTER_TYPE_INFO outInfo = { MFMediaType_Video, MFVideoFormat_H264 };

        // Query for the adapter LUID to get a matching encoder for the device.
        CComQIPtr<IDXGIDevice> dxgiDevice(device);
        CHECK(dxgiDevice);
        CComPtr<IDXGIAdapter> adapter;
        CHECK_HR(dxgiDevice->GetAdapter(&adapter));

        DXGI_ADAPTER_DESC adapterDesc;
        CHECK_HR(adapter->GetDesc(&adapterDesc));

        CComPtr<IMFAttributes> enumAttrs;
        CHECK_HR(MFCreateAttributes(&enumAttrs, 1));
        CHECK_HR(enumAttrs->SetBlob(MFT_ENUM_ADAPTER_LUID, (BYTE*)&adapterDesc.AdapterLuid, sizeof(LUID)));

        CHECK_HR(MFTEnum2(MFT_CATEGORY_VIDEO_ENCODER, MFT_ENUM_FLAG_HARDWARE | MFT_ENUM_FLAG_SORTANDFILTER, &inInfo, &outInfo, enumAttrs, &activateRaw, &activateCount));

        CHECK(activateCount != 0);

        // Choose the first returned encoder
        CComPtr<IMFActivate> activate = activateRaw[0];

        // Memory management
        for (UINT32 i = 0; i < activateCount; i++)
            activateRaw[i]->Release();

        // Activate
        CHECK_HR(activate->ActivateObject(IID_PPV_ARGS(&transform)));

        // Get attributes
        CHECK_HR(transform->GetAttributes(&transformAttrs));
    }


    // ------------------------------------------------------------------------
    // Query encoder name (not necessary, but nice) and unlock for async use
    // ------------------------------------------------------------------------

    {

        UINT32 nameLength = 0;
        std::wstring name;

        CHECK_HR(transformAttrs->GetStringLength(MFT_FRIENDLY_NAME_Attribute, &nameLength));

        // IMFAttributes::GetString returns a null-terminated wide string
        name.resize((size_t)nameLength + 1);
        CHECK_HR(transformAttrs->GetString(MFT_FRIENDLY_NAME_Attribute, &name[0], (UINT32)name.size(), &nameLength));
        name.resize(nameLength);

        printf("Using %ls\n", name.c_str());

        // Unlock the transform for async use and get event generator
        CHECK_HR(transformAttrs->SetUINT32(MF_TRANSFORM_ASYNC_UNLOCK, TRUE));
        CHECK(eventGen = transform);
    }

    // Get stream IDs (expect 1 input and 1 output stream)
    {
        HRESULT hr = transform->GetStreamIDs(1, &inputStreamID, 1, &outputStreamID);
        if (hr == E_NOTIMPL)
        {
            inputStreamID = 0;
            outputStreamID = 0;
            hr = S_OK;
        }
        CHECK_HR(hr);
    }


    // ------------------------------------------------------------------------
    // Configure hardware encoder MFT
    // ------------------------------------------------------------------------

    // Set D3D manager
    CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_SET_D3D_MANAGER, reinterpret_cast<ULONG_PTR>(deviceManager.p)));

    // Set output type
    CComPtr<IMFMediaType> outputType;
    CHECK_HR(MFCreateMediaType(&outputType));

    CHECK_HR(outputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video));
    CHECK_HR(outputType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_H264));
    CHECK_HR(outputType->SetUINT32(MF_MT_AVG_BITRATE, 30000000));
    CHECK_HR(MFSetAttributeSize(outputType, MF_MT_FRAME_SIZE, ENCODE_WIDTH, ENCODE_HEIGHT));
    CHECK_HR(MFSetAttributeRatio(outputType, MF_MT_FRAME_RATE, 60, 1));
    CHECK_HR(outputType->SetUINT32(MF_MT_INTERLACE_MODE, 2));
    CHECK_HR(outputType->SetUINT32(MF_MT_ALL_SAMPLES_INDEPENDENT, TRUE));

    CHECK_HR(transform->SetOutputType(outputStreamID, outputType, 0));

    // Set input type
    CComPtr<IMFMediaType> inputType;
    CHECK_HR(transform->GetInputAvailableType(inputStreamID, 0, &inputType));

    CHECK_HR(inputType->SetGUID(MF_MT_MAJOR_TYPE, MFMediaType_Video));
    CHECK_HR(inputType->SetGUID(MF_MT_SUBTYPE, MFVideoFormat_NV12));
    CHECK_HR(MFSetAttributeSize(inputType, MF_MT_FRAME_SIZE, ENCODE_WIDTH, ENCODE_HEIGHT));
    CHECK_HR(MFSetAttributeRatio(inputType, MF_MT_FRAME_RATE, 60, 1));

    CHECK_HR(transform->SetInputType(inputStreamID, inputType, 0));


    // ------------------------------------------------------------------------
    // Create sample allocator
    // ------------------------------------------------------------------------

    {
        MFCreateVideoSampleAllocatorEx(IID_PPV_ARGS(&allocator));
        CHECK(allocator);

        CComPtr<IMFAttributes> allocAttrs;
        MFCreateAttributes(&allocAttrs, 2);

        CHECK_HR(allocAttrs->SetUINT32(MF_SA_D3D11_BINDFLAGS, D3D11_BIND_RENDER_TARGET));
        CHECK_HR(allocAttrs->SetUINT32(MF_SA_D3D11_USAGE, D3D11_USAGE_DEFAULT));

        CHECK_HR(allocator->SetDirectXManager(deviceManager));
        CHECK_HR(allocator->InitializeSampleAllocatorEx(1, 2, allocAttrs, inputType));
    }


    // ------------------------------------------------------------------------
    // Start encoding
    // ------------------------------------------------------------------------

    CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_COMMAND_FLUSH, NULL));
    CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_NOTIFY_BEGIN_STREAMING, NULL));
    CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_NOTIFY_START_OF_STREAM, NULL));

    // Encode loop
    for (int i = 0; i < ENCODE_FRAMES; i++)
    {
        // Get next event
        CComPtr<IMFMediaEvent> event;
        CHECK_HR(eventGen->GetEvent(0, &event));

        MediaEventType eventType;
        CHECK_HR(event->GetType(&eventType));

        switch (eventType)
        {
        case METransformNeedInput:
        {
            CComPtr<IMFSample> sample;
            CHECK_HR(allocator->AllocateSample(&sample));
            CHECK_HR(transform->ProcessInput(inputStreamID, sample, 0));

            // Dereferencing the device once after feeding each frame "fixes" the leak.
            //device.p->Release();

            break;
        }

        case METransformHaveOutput:
        {
            DWORD status;
            MFT_OUTPUT_DATA_BUFFER outputBuffer = {};
            outputBuffer.dwStreamID = outputStreamID;

            CHECK_HR(transform->ProcessOutput(0, 1, &outputBuffer, &status));

            DWORD bufCount;
            DWORD bufLength;
            CHECK_HR(outputBuffer.pSample->GetBufferCount(&bufCount));

            CComPtr<IMFMediaBuffer> outBuffer;
            CHECK_HR(outputBuffer.pSample->GetBufferByIndex(0, &outBuffer));
            CHECK_HR(outBuffer->GetCurrentLength(&bufLength));

            printf("METransformHaveOutput buffers=%d, bytes=%d\n", bufCount, bufLength);

            // Release the sample as it is not processed further.
            if (outputBuffer.pSample)
                outputBuffer.pSample->Release();
            if (outputBuffer.pEvents)
                outputBuffer.pEvents->Release();
            break;
        }
        }
    }

    // ------------------------------------------------------------------------
    // Finish encoding
    // ------------------------------------------------------------------------

    CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_NOTIFY_END_OF_STREAM, NULL));
    CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_NOTIFY_END_STREAMING, NULL));
    CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_COMMAND_DRAIN, NULL));

    // Shutdown
    printf("Finished encoding\n");

    // I've tried all kinds of things...
    //CHECK_HR(transform->ProcessMessage(MFT_MESSAGE_SET_D3D_MANAGER, reinterpret_cast<ULONG_PTR>(nullptr)));

    //transform->SetInputType(inputStreamID, NULL, 0);
    //transform->SetOutputType(outputStreamID, NULL, 0);

    //transform->DeleteInputStream(inputStreamID);

    //deviceManager->ResetDevice(NULL, resetToken);

    CHECK_HR(MFShutdownObject(transform));
}
oguz ismail
  • 1
  • 16
  • 47
  • 69
KeloCube
  • 56
  • 2
  • 5
  • I think you'd better off posting this on https://devtalk.nvidia.com/ right away. Quite possible their MFT is leaky, I did not notice they cared too much about it. – Roman R. May 22 '20 at 17:11
  • That's the impression I got as well, but I guess it's worth a shot. Last time I asked about their MFT's odd behaviour I was told that they were looking into it, but I was never followed back. Thanks for taking a look at the question Roman, your blog and SO answers have been an invaluable resource in my development journey with MF! – KeloCube May 23 '20 at 01:15
  • Is this fixed?? – Meme Machine Dec 30 '21 at 22:48
  • Yes, the leak seems to be fixed in Nvidia's latest drivers. However, I found the overall quality of the vendor-provided MFTs quite lacking (not only in Nvidia's case) and would not recommend using them for a serious product. Platform-specific APIs (NVENC, etc.) should likely be more reliable. – KeloCube Jan 05 '22 at 03:35

1 Answers1

1

I think the answer is “yes”.

I saw the problem before: Is it possible to shut down a D3D device?

To workaround, I stopped re-creating D3D devices. Instead I’m using a global CAtlMap collection. The keys are uint64_t containing LUID of the GPU from DXGI_ADAPTER_DESC::AdapterLuid field. The values are structures with 2 fields, CComPtr<ID3D11Device> and CComPtr<IMFDXGIDeviceManager>

Soonts
  • 20,079
  • 9
  • 57
  • 130
  • I found that the leaking component was specifically NVIDIA's encoder MFT - simply creating and destroying a D3D device in a loop does not leak. Your solution is a reasonable fix, but it's not applicable in my use case since my application will be installed as a service and I need to release all graphics resources when the application is not doing anything. I ended up manually checking for and releasing the leaky reference if the encoder's vendor id matches NVIDIA's. – KeloCube Jul 09 '20 at 15:25
  • @KeloCube In this case, you can try using return value from AddRef or Release method. Microsoft does not recommend doing that saying it’s unreliable, and I’m not sure the value is actually correct for that particular object, but I think it’s lesser of evils compared to checking vendor ID. Your current code may cause your software to crash if nVidia fixes their code someday. – Soonts Jul 09 '20 at 15:46
  • Yeah, good point - that's exactly what I'm doing. What I meant is that I check the vendor id first and apply the "fix" only if it matches. – KeloCube Jul 09 '20 at 16:04
  • @KeloCube Also, for your use case there’s a reliable solution (albeit can be expensive to implement) – split your service into 2 processes, network frontend and GPU backend. Once idle, ask the backend process to quit, this will 100% release everything. Will consume much less memory when idle. Heaps, especially unmanaged ones, are reluctant to give memory back to the OS, they assume you might need to use it later and retain the memory. – Soonts Jul 09 '20 at 17:17