1

I am reading AAC audio frames which I then decode to PCM with Media Foundation and am trying to play back through WASAPI. Particularly 48000khz 2 channels, 16 bit. I am able to decode the frames, write them to a file full.pcm, and then open and play that PCM file successfully in Audacity. However, my code to play back through the device speakers gives me nothing. The source I am trying to play through is the default source, which is my DAC. I am not getting any bad HRESULTS from any of the WASAPI-related code, so I'm confused. WASAPI is new to me though, so maybe there is something obvious I am missing.

#include "AudioDecoder.h"
#include <vector>
#include <__msvc_chrono.hpp>
#include <string>
#include <fstream>
#include <cassert>
#include <filesystem>

#include <mmdeviceapi.h>
#include <endpointvolume.h>
#include <functiondiscoverykeys.h> 
#include <audioclient.h>

int fps_counter = 0;
int frame_index = 0;

IAudioClient* audio_client;
IAudioRenderClient* render_client = nullptr;

int setup_audio_playback()
{
    HRESULT hr = S_OK;

    IMMDeviceEnumerator* pEnumerator = nullptr;
    IMMDevice* pDevice = nullptr;

    ATLENSURE_SUCCEEDED(CoCreateInstance(__uuidof(MMDeviceEnumerator), nullptr, CLSCTX_ALL, __uuidof(IMMDeviceEnumerator), (void**)&pEnumerator));

    ATLENSURE_SUCCEEDED(pEnumerator->GetDefaultAudioEndpoint(eRender, eConsole, &pDevice));

    IPropertyStore* ips;
    ATLENSURE_SUCCEEDED(pDevice->OpenPropertyStore(STGM_READ, &ips));

    PROPVARIANT varName;
    // Initialize container for property value.
    PropVariantInit(&varName);
    ATLENSURE_SUCCEEDED(ips->GetValue(PKEY_Device_FriendlyName, &varName));

    std::wcout << L"Device name: " << varName.pwszVal << std::endl;

    ATLENSURE_SUCCEEDED(pDevice->Activate(__uuidof(IAudioClient), CLSCTX_ALL, nullptr, (void**)&audio_client));

    WAVEFORMATEX* format;
    ATLENSURE_SUCCEEDED(audio_client->GetMixFormat(&format));

    ATLENSURE_SUCCEEDED(audio_client->Initialize(AUDCLNT_SHAREMODE_SHARED, 0, 10000000, 0, format, NULL));

    uint32_t bufferFrameCount;
    ATLENSURE_SUCCEEDED(audio_client->GetBufferSize(&bufferFrameCount));

    ATLENSURE_SUCCEEDED(audio_client->GetService(__uuidof(IAudioRenderClient), (void**)&render_client));

    ATLENSURE_SUCCEEDED(audio_client->Start());

    return hr;
}

int main()
{
    HRESULT hr = S_OK;

    std::ofstream fout_all_frames_pcm;

    std::filesystem::remove(std::filesystem::current_path() / "full.pcm");

    fout_all_frames_pcm.open("full.pcm", std::ios::binary | std::ios::out);

    if (FAILED(hr = CoInitializeEx(nullptr, COINIT_APARTMENTTHREADED)))
        return hr;
    if (FAILED(hr = MFStartup(MF_VERSION)))
        return hr;

    setup_audio_playback();

    AudioDecoder* ad = new AudioDecoder();

    std::vector<uint8_t> data;

    while (true)
    {
        std::chrono::time_point<std::chrono::steady_clock> iteration_time = std::chrono::high_resolution_clock::now();

        // Read frame data
        std::ifstream fin("Encoded Audio Frames\\frame" + std::to_string(frame_index) + ".aac", std::ios::binary | std::ios::in);

        if (fin.fail())
        {
            //throw std::runtime_error("Invalid file path specified");
            break;
        }

        // Get file length
        fin.seekg(0, std::ios::end);
        size_t const length = fin.tellg();
        fin.seekg(0, std::ios::beg);

        if (length > data.size())
        {
            static size_t constexpr const granularity = 64 << 10;
            data.resize((length + (granularity - 1)) & ~(granularity - 1));
            assert(length <= data.size());
        }

        // Copy frame data from file to array;
        fin.read(reinterpret_cast<char*>(data.data()), length);
        fin.close();

        CComPtr<IMFSample> pcm_sample;
        while (!ad->decode_sync(data.data(), length, &pcm_sample))
        {
            if (pcm_sample == nullptr) // This will happen if the color converter isn't able to produce output, so we will continue in that case
                continue;

            CComPtr<IMFMediaBuffer> buffer;
            if (FAILED(hr = pcm_sample->ConvertToContiguousBuffer(&buffer)))
                return hr;

            unsigned char* datas;
            DWORD length;
            if (FAILED(hr = buffer->GetCurrentLength(&length)))
                return hr;

            if (FAILED(hr = buffer->Lock(&datas, nullptr, &length)))
                return hr;

            fout_all_frames_pcm.write((char*)datas, length);

            // Does nothing
            //Sleep(120);

            // Grab all the available space in the shared buffer.
            uint8_t* pData;
            ATLENSURE_SUCCEEDED(render_client->GetBuffer(1, &pData));

            memcpy(pData, datas, length);

            DWORD flags = 0;
            ATLENSURE_SUCCEEDED(render_client->ReleaseBuffer(1, flags));

            pcm_sample.Release();
        }

        frame_index++;
    }

    audio_client->Stop();

    return 0;
}
Meme Machine
  • 949
  • 3
  • 14
  • 28

1 Answers1

1

Doing

render_client->GetBuffer(1, ...

will not give you any stable behavior because you are trying to submit data sample by sample. Literally, one PCM sample of your 48000 samples per second. Of course, the code is likely to be broken more than this because you seem to be simply losing most of the data getting much more from decoder and feeding just one sample to the device.

You would want to check this article in the part where the code identifies how many samples the GetBuffer will carry and then loop with filling those buffers accurately until you consume your IMFsample data.

How large those buffers are, those you obtain with GetBuffer? For 10 ms buffers which are pretty typical and 48 kHz sampling rate, you would have 480 samples per buffer. With stereo and 16-bit PCM you have four bytes per sample and so you would be delivering around 2K bytes every GetBuffer/ReleaseBuffer iteration.

Roman R.
  • 68,205
  • 6
  • 94
  • 158
  • I've read the article, and I dislike that sample they used. Especially since it is using some custom class. I'm just quite confused as well on why they are sleeping for half a second for each iteration. Here is my code: https://gist.github.com/MemeMachineSO/9fef0d1c599f87993af7b9e10e2bce88 I got it "working" i.e it is playing sound and it seems fine, but the calculations are almost certainly inaccurate. I just am dividing by 100 because it seemed to work, not for any particular reason (which is bad lol). I am not using any Sleep at all - should I be? The sample does. I'm just really confused – Meme Machine Sep 03 '22 at 23:09
  • It's also unclear to me if any of these API's like ReleaseBuffer are blocking or not. I feel my code shouldn't work, but it "does". – Meme Machine Sep 03 '22 at 23:23
  • Sleeping in this sample code simplifies sample to show that you are adding data beet by bit wit the pace of playback. You would normally use events or timers, better samples are located [there](https://learn.microsoft.com/en-us/windows/win32/coreaudio/rendersharedeventdriven). – Roman R. Sep 04 '22 at 07:23
  • AFAIR the calls are not blocking. If you call too frequently with, however, doing all the math described by documentation (padding, next buffer size etc) you would just see that you can append zero bytes because the current buffer is already full. You can easily check this out by feeding too much of data. – Roman R. Sep 04 '22 at 07:31
  • Alright, so to confirm - the reason there is a sleep in the example is because since none of the APIs are blocking, it needs to allow time to play the sound right? Now, is the only reason that my code worked without a sleep just by chance? – Meme Machine Sep 04 '22 at 14:58
  • Th reason is that the code indicates that played data is not necessarily available at once and can be fed as playback goes. The API does not make promises and might be playing well ni this particular case, for example, by expaning internal buffer to accept your data and your data is small enough to not hit any other limit when API might get blocking or return a failure, or exhibit different behavior otherwise. – Roman R. Sep 04 '22 at 15:59