WASAPI captured packets do not align

Question

I'm trying to visualize a soundwave captured by WASAPI loopback but find that the packets I record do not form a smooth wave when put together.

My understanding of how the WASAPI capture client works is that when I call pCaptureClient->GetBuffer(&pData, &numFramesAvailable, &flags, NULL, NULL) the buffer pData is filled from the front with numFramesAvailable datapoints. Each datapoint is a float and they alternate by channel. Thus to get all available datapoints I should cast pData to a float pointer, and take the first channels * numFramesAvailable values. Once I release the buffer and call GetBuffer again it provides the next packet. I would assume that these packets would follow on from each other but it doesn't seem to be the case.

My guess is that either I'm making an incorrect assumption about the format of the audio data in pData or the capture client is either missing or overlapping frames. But have no idea how to check these.

To make the code below as brief as possible I've removed things like error status checking and cleanup.

Initialization of capture client:

const CLSID CLSID_MMDeviceEnumerator = __uuidof(MMDeviceEnumerator);
const IID IID_IMMDeviceEnumerator = __uuidof(IMMDeviceEnumerator);
const IID IID_IAudioClient = __uuidof(IAudioClient);
const IID IID_IAudioCaptureClient = __uuidof(IAudioCaptureClient);

pAudioClient = NULL;
IMMDeviceEnumerator * pDeviceEnumerator = NULL;
IMMDevice * pDeviceEndpoint = NULL;
IAudioClient *pAudioClient = NULL;
IAudioCaptureClient *pCaptureClient = NULL;
int channels;
// Initialize audio device endpoint
CoInitialize(nullptr);
CoCreateInstance(CLSID_MMDeviceEnumerator, NULL, CLSCTX_ALL, IID_IMMDeviceEnumerator, (void**)&pDeviceEnumerator );
pDeviceEnumerator ->GetDefaultAudioEndpoint(eRender, eConsole, &pDeviceEndpoint );

// init audio client
WAVEFORMATEX *pwfx = NULL;
REFERENCE_TIME hnsRequestedDuration = 10000000;
REFERENCE_TIME hnsActualDuration;

audio_device_endpoint->Activate(IID_IAudioClient, CLSCTX_ALL, NULL, (void**)&pAudioClient);
pAudioClient->GetMixFormat(&pwfx);

pAudioClient->Initialize(AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_LOOPBACK, hnsRequestedDuration, 0, pwfx, NULL);
channels = pwfx->nChannels;

pAudioClient->GetService(IID_IAudioCaptureClient, (void**)&pCaptureClient);
pAudioClient->Start();  // Start recording.

Capture of packets (note that std::mutex packet_buffer_mutex and vector<vector<float>> packet_bufferare already be defined and used by another thread to safely display the data):

UINT32 packetLength = 0;
BYTE *pData = NULL;
UINT32 numFramesAvailable;
DWORD flags;
int max_packets = 8;

std::unique_lock<std::mutex>write_guard(packet_buffer_mutex, std::defer_lock);

while (true) {
    pCaptureClient->GetNextPacketSize(&packetLength);
    while (packetLength != 0)
    {
        // Get the available data in the shared buffer.
        pData = NULL;
        pCaptureClient->GetBuffer(&pData, &numFramesAvailable, &flags, NULL, NULL);

        if (flags & AUDCLNT_BUFFERFLAGS_SILENT)
        {
            pData = NULL;  // Tell CopyData to write silence.
        }

        write_guard.lock();
        if (packet_buffer.size() == max_packets) {
            packet_buffer.pop_back();
        }

        if (pData) {
            float * pfData = (float*)pData;
            packet_buffer.emplace(packet_buffer.begin(), pfData, pfData + channels * numFramesAvailable);
        } else {
            packet_buffer.emplace(packet_buffer.begin());
        }
        write_guard.unlock();

        hpCaptureClient->ReleaseBuffer(numFramesAvailable);
        pCaptureClient->GetNextPacketSize(&packetLength);
    }
    std::this_thread::sleep_for(std::chrono::milliseconds(10));
}

I store the packets in a vector<vector<float>> (where each vector<float> is a packet) removing the last one and inserting the newest at the start so I can iterate over them in order. Below is the result of a captured sinewave, plotting alternating values so it only represents a single channel. It is clear where the packets are being stitched together.

Looks like you copied the code [from here](https://learn.microsoft.com/en-us/windows/win32/coreaudio/capturing-a-stream). The SetFormat() call is missing, not good. — Hans Passant, Oct 01 '20 at 15:53
isn't SetFormat just a user defined function that informs how to copy the data, something I handle myself when I convert a packet to a `vector`? — KyleL, Oct 01 '20 at 15:59
How often you have `AUDCLNT_BUFFERFLAGS_DATA_DISCONTINUITY` in returned `flags` — Roman R., Oct 19 '20 at 06:09

score 0 · Answer 1 · answered Nov 19 '20 at 14:34

Something is playing a sine wave to Windows; you're recording the sine wave back in the audio loopback; and the sine wave you're getting back isn't really a sine wave.

You're almost certainly running into glitches. The most likely causes of glitching are:

Whatever is playing the sine wave to Windows isn't getting data to Windows in time, so the buffer is running dry.
Whatever is reading the loopback data out of Windows isn't reading the data in time, so the buffer is filling up.
Something is going wrong in between playing the sine wave to Windows and reading it back.

It is possible that more than one of these are happening.

The IAudioCaptureClient::GetBuffer call will tell you if you read the data too late. In particular it will set *pdwFlags so that the AUDCLNT_BUFFERFLAGS_DATA_DISCONTINUITY bit is set.

Looking at your code, I see you're doing the following things between the GetBuffer and the WriteBuffer:

Waiting on a lock
Sometimes doing something called "pop_back"
Doing something called "emplace"

I quote from the above-linked documentation:

Clients should avoid excessive delays between the GetBuffer call that acquires a packet and the ReleaseBuffer call that releases the packet. The implementation of the audio engine assumes that the GetBuffer call and the corresponding ReleaseBuffer call occur within the same buffer-processing period. Clients that delay releasing a packet for more than one period risk losing sample data.

In particular you should NEVER DO ANY OF THE FOLLOWING between GetBuffer and ReleaseBuffer because eventually they will cause a glitch:

Wait on a lock
Wait on any other operation
Read from or write to a file
Allocate memory

Instead, pre-allocate a bunch of memory before calling IAudioClient::Start. As each buffer arrives, write to this memory. On the side, have a regularly scheduled work item that takes written memory and writes it to disk or whatever you're doing with it.

`std::vector::emplace` could be a memory allocation, depending on `std::vector::capacity`. Since the example has no path to reserve extra capacity outside the `GetBuffer`/`ReleaseBuffer` span, that means the vector will occasionally grow. But the worst part is likely the lock; a small memory allocation like this is pretty fast. — MSalters, Sep 20 '21 at 09:42

WASAPI captured packets do not align

1 Answers1