Trouble synchronizing real-time audio visualization with generated audio samples

Question

I am trying to build an application that visualizes the waveform of generated audio samples. The application continuously generates audio samples, fills a buffer, and supplies said buffer to WASAPI for playback. The user can change the frequency of the generated audio samples by pressing various keyboard keys, like playing keys on a piano (in the range of C4 to B4, or 261.6256 Hz to 493.8833 Hz). This is a minimalistic version of the code for generating those samples.

void Update()
{
    static float prev_freq = 0;
    //process input & events
    {
        if (main_window->IsKeyPressed(KeyCode::C))
        {
            pressed = PIANO::C;
            frequency = C4_FREQUENCY;
        }
        else if (main_window->IsKeyPressed(KeyCode::F))
        {
            pressed = PIANO::C_SHARP;
            frequency = C4_SHARP_FREQUENCY;
        }

        else
        {
            pressed = PIANO::NONE;
            frequency = 0.0f;
        }
    }

    if (prev_freq != frequency)
    {
        prev_freq = frequency;
        time_held = 0;
    }

    
    //grab all available audio frames
    static float time = 0;
    HRESULT hr;
    UINT32 num_frames, padding_frames;

    hr = pAudioClient->GetBufferSize(&num_frames);
    hr = pAudioClient->GetCurrentPadding(&padding_frames);
    available_frames = num_frames - padding_frames;

    //write data to audio frames
    hr = pAudioRenderClient->GetBuffer(available_frames, (BYTE**)&buffer);
    for (unsigned int i = 0; i < available_frames; i++)
    {
        float amp = max_amplitude * sin(2 * PI * frequency * time);
        buffer[i] = (int16_t)amp;
        time += 1.f / wave_fmt->nSamplesPerSec;
    }
    hr = pAudioRenderClient->ReleaseBuffer(available_frames, NULL);
    
    //time one key has been held in milliseconds
    time_held += (frequency == 0 ? (-time_held) : dt);
}

The audio generation works well enough so far, but now I'm trying to achieve a real-time visualization of the generated audio waveform, i.e., the application should display a wave that reflects what the speakers are currently playing. I had a couple ideas for achieving this but none worked out.

Directly retrieve a pointer to the audio buffer next to be played from WASAPI. As far as I can tell there is no way to do this, but I'm curious if there's a workaround.
Approximate the waveform by using the most recently submitted buffer's data. Given the size of the buffer is only long enough to hold 0.01 seconds of audio, I figure the visual delay won't be very noticeable. The only problems with this method are that it produces overlapping sine waves at certain frequencies, instead of the 1 sine wave I'm expecting. I suspect this issue has to do with timing/synchronization between the audio generation loop and the render loop. This is the implementation of this solution.

void Render()
{
     for (int i = 0; i < available_frames; i++)
     {
         ball_y = main_window->GetHeight() / 2;
         float deflection = (float)buffer[i] / (float)max_amplitude;
         deflection *= DEFAULT_AMPLITUDE_DEFLECTION;
         ball_y += deflection;
         ball_x = ((float)i / (float)available_frames);
         ball_x *= main_window->GetWidth();
         main_window->DrawCircle(ball_x, ball_y, 2);
     }
}

Approximate the waveform based on how long a key is held. This method produces a slightly jittering effect and the overlapping sine wave problem persists, so the solution is worse than #2.

void Update()
{
        ...
        ...
        ...
    //time one key has been held in milliseconds
    time_held += (frequency == 0 ? (-time_held) : dt);

}


void Render()
{
     float time_held_seconds = time_held / 1000.f;
     for (int i = 0; i < available_frames; i++)
     {
           //get the number of samples generated since the key was pressed
           unsigned int generated_samples = wave_fmt->nSamplesPerSec * (time_held_seconds);
           //modulo number of samples by buffer size to get current index in buffer
           unsigned int buffer_position = generated_samples % available_frames;
           unsigned int buffer_index = (buffer_position + i) % available_frames;
           //convert to screen space coordinates
           ball_y = main_window->GetHeight() / 2;
           float deflection = ((float)buffer[buffer_index] / (float)max_amplitude);
           deflection *= DEFAULT_AMPLITUDE_DEFLECTION;
           ball_y += deflection;
           ball_x = ((float)i / (float)available_frames);
           ball_x *= main_window->GetWidth();
           main_window->DrawCircle(ball_x, ball_y, 2);
     }
}

While debugging, I noticed that if I add a std::cout statement inside the render for (int i = 0; i < available_frames; i++) loop, the of overlapping sine waves virtually disappears. The drawback being that the application slows to a crawl. This is what led me to believe the issue lies with timing or synchronizaiton. With this psuedo-fix in mind, I then tried implementing a controlled delay inside the render loop, i.e., locking the framerate. The drawback with this method is depending on TARGET_FPS, the application either has a pronounced "flickering" effect with little to no sine wave overlap (when TARGET_FPS is in the 1 - 30 range) or little to no flickering but with pronounced overlap of sine waves ( when TARGET_FPS is in the 300+ range). With TARGET_FPS between 30 to 300, both the flickering and the sine wave overlap are prominent.

void OnRender()
{
    QueryPerformanceFrequency(&freq);
    QueryPerformanceCounter(&end_t);

    LONGLONG dt = end_t.QuadPart - start_t.QuadPart;
    float cycle_time = dt / (float)freq.QuadPart;

    if (cycle_time < (1.f / TARGET_FPS))
        return;
    start_t = end_t;
    
    for (int i = 0; i < available_frames; i++)
    {
           //get the number of samples generated since the key was pressed
           unsigned int generated_samples = wave_fmt->nSamplesPerSec * (time_held_seconds);
           //modulo number of samples by buffer size to get current index in buffer
           unsigned int buffer_position = generated_samples % available_frames;
           unsigned int buffer_index = (buffer_position + i) % available_frames;
           //convert to screen space coordinates
           ball_y = main_window->GetHeight() / 2;
           float deflection = ((float)buffer[buffer_index] / (float)max_amplitude);
           //std::cout << deflection << '\n'; <-- uncommenting this indirectly fixes the overlapping sine waves issue but slows the audio and the program to a crawl
           deflection *= DEFAULT_AMPLITUDE_DEFLECTION;
           ball_y += deflection;
           ball_x = ((float)i / (float)available_frames);
           ball_x *= main_window->GetWidth();
           main_window->DrawCircle(ball_x, ball_y, 2);
    }
}

I thought the flickering might have something do with the background being cleared before every call to Render(), so I disabled background clearing to test this. The sine waves are very clearly being drawn at phase-offset positions.

This is as far as I could get. Any help is greatly appreciated. I apologize for the length and if the format of this question is incorrect, this is my first post. Thanks.

Edit: This is how the application runs the Update and Render methods.

void Application::Run()
{
    while (running)
    {
        Update();
        Render();
    }
}

Edit #2: Updated to use ring buffer and event callback

//constructor creates worker thread
SoundSnythApp::SoundSynthApp()
{
    //initialize buffer with 0 amplitude
    UINT32 num_frames, padding_frames;

    hr = pAudioClient->GetBufferSize(&num_frames);
    hr = pAudioClient->GetCurrentPadding(&padding_frames);
    available_frames = num_frames - padding_frames;
    ring_buffer_size = (size_t)num_frames;
    ring_buffer = new int16_t[ring_buffer_size];


    hr = pAudioRenderClient->GetBuffer(available_frames, (BYTE**)&buffer);
    for (unsigned int i = 0; i < available_frames; i++)
    {
        int16_t* pWrite = &ring_buffer[write_pos];
        *pWrite = (int16_t)0;
        buffer[i] = *pWrite;
        write_pos = (write_pos + 1) % ring_buffer_size;
        if (write_pos == read_pos)
            read_pos = (read_pos + 1) % ring_buffer_size;
    }
    hr = pAudioRenderClient->ReleaseBuffer(available_frames, NULL);

    //start audio stream
    pAudioClient->Start();

    //start buffer-fill thread
    callback_thread = std::thread(&SoundSynthApp::AudioCallback, this);
}

//Audio Callback thread
void SoundSynthApp::AudioCallback()
{
    while (main_window->IsAlive())
    {
        DWORD callback_signal = WaitForSingleObject(callback_event_handle, INFINITE);
        if(callback_signal == WAIT_OBJECT_0)
        {
            UINT32 num_frames = 0, padding_frames = 0;
            HRESULT hr;
            static float time = 0;

            hr = pAudioClient->GetBufferSize(&num_frames);
            hr = pAudioClient->GetCurrentPadding(&padding_frames);
            available_frames = num_frames - padding_frames;
            //write data to audio frames
            hr = pAudioRenderClient->GetBuffer(available_frames, (BYTE**)&buffer);
        
            for (unsigned int i = 0; i < available_frames; i++)
            {
                //generate audio
                float amp = max_amplitude * sin(2 * PI * frequency * time);
            
                int16_t* pWrite = &ring_buffer[write_pos];
                *pWrite = (int16_t)amp;
                buffer[i] = ring_buffer[write_pos];
                write_pos = (write_pos + 1) % ring_buffer_size;
                if (write_pos == read_pos)
                    read_pos = (read_pos + 1) % ring_buffer_size;
                time += 1.f / wave_fmt->nSamplesPerSec;
            }
            hr = pAudioRenderClient->ReleaseBuffer(available_frames, NULL);
         }
     }    
}

//Render loop
void SoundSnythApp::OnRender()
{
    for (unsigned int i = 0; i < available_frames; i++)
    {
        ball_y = (float)main_window->GetHeight() / 2;
        float deflection = (float)ring_buffer[read_pos]/max_amplitude;
        read_pos = (read_pos + 1) % ring_buffer_size;
        deflection *= DEFAULT_AMPLITUDE_DEFLECTION;
        ball_y += deflection;
        ball_x = ((float)i / (float)available_frames);
        ball_x *= main_window->GetWidth();
        main_window->DrawCircle(ball_x, ball_y, 2);
    }
}

Edit #3: Zero-crossing pinned to left-hand side of window

void Render()
{
    int zero_crossing_index = -1;
    const unsigned int samples_to_render = 400;
    //find sample where zero crossing occurs, moving backwards starting from the most recently writ sample
    for (unsigned int i = (unsigned int)ring_buffer_size - 1; i > 0; i--)
    {
        if (ring_buffer[i] >= 0 && ring_buffer[i - 1] <= 0)
        {
            zero_crossing_index = i - 1;
            break;
        }
    }

    if (zero_crossing_index == -1)
        zero_crossing_index = (unsigned int)ring_buffer_size - 1;
    //pin zero-crossing to left side of screen
    for (unsigned int i = 0; i < samples_to_render; i++)
    {
        ball_x = (float)i / (float)samples_to_render;
        ball_x *= (float)main_window->GetWidth();

        float deflection = (float)ring_buffer[((unsigned int)zero_crossing_index + i) % ring_buffer_size];
        deflection /= (float)max_amplitude;
        deflection *= DEFAULT_AMPLITUDE_DEFLECTION;
        ball_y = (float)main_window->GetHeight() / 2;
        ball_y += deflection;
        main_window->DrawCircle(ball_x, ball_y, 1);
    }
}

Create a sufficiently-sized ring buffer (circular queue) and also deliver the audio into that. You can probably keep the locking light-weight by using an atomic to represent the current tail. The audio thread can write its data into your queue, and _then_ update the tail. The render thread simply grabs the tail and renders `tail - count` samples (where 'count' is however many samples you wish to display). Ballpark ring buffer size would be enough to hold any in-flight audio plus X seconds of prior audio where `X` is _at least_ `1 / FPS`. It would be safer to store much more, _e.g._ 500ms. — paddy, Aug 24 '23 at 23:04
@paddy Thanks for the quick reply. I looked into ring buffers, but I'm a little confused as to how this would benefit the application. The Render and Update methods run sequentially in a while loop on the same thread, for the lifetime of the application . It seems that ring buffering would help with asynchronous reading and writing of the audio data, but in this case how would that change the output, given that the buffer is filled and then immediately read? I've added some code to clarify the application architecture. — J.T. Freeman, Aug 25 '23 at 01:02
This is one of those cases where an image would help. You've gone to great lengths to explain what you're doing, while overlooking fundamental things like explaining what "sine wave overlap" means. If you're trying to visualize the final audio as you might see on an oscilloscope with automatic sync, you'll need logic to detect zero-crossings such that you pin a crossing to a particular point on the screen. Basically a [phase-locked loop](https://en.wikipedia.org/wiki/Phase-locked_loop) — paddy, Aug 25 '23 at 04:25
Regarding the synchronous design of your program, that is generally undesirable for audio. You should consider decoupling your audio generator from the visualization. Displays refresh at a much lower frequency than audio, and all you're doing here is burning energy for no reason. A typical audio application has either a service thread that runs the audio, or sets up a callback that the OS or driver invokes to fill the audio buffer on every output cycle. This should be independent from anything else, especially display. — paddy, Aug 25 '23 at 04:32
@paddy Got it. I tweaked the code to use ring buffers and event-driven callbacks. I added a gif to hopefully clear up the sine wave overlap issue. The "wobble" or overlap effect is what I'm trying to mitigate. I'd like to see only one wave rendered. The wave produced when the D key is pressed is closest to what I'm envisioning. I don't know if this is a bug or an inherent limitation to rendering audio waves. What's interesting is the jitter and overlap doesn't look nearly as bad when it's recorded as it does in real-time. Seems like the issue could be related to frame rate. — J.T. Freeman, Aug 26 '23 at 04:07
You _are_ seeing one wave, but its phase depends entirely on when you last drew it. Like I said, if you want to see the wave without phase-shifts, look for a zero-crossing. In your case, a simplistic search backwards from the last sample until you find `sample[i-1] >= 0 && sample[i] <= 0` would probably do it. This also covers the zero-output mode. You could set a maximum search distance and cover the case where this is never true by just pinning to the last sample. Any "tearing" you see is an artifact of screen refresh and possibly also over-drawing, depending on what UI toolkit you're using. — paddy, Aug 28 '23 at 00:51
@paddy Okay, I see what you mean about the zero-crossing now. I added one more edit with the new code and the rendered visualization. I was wondering if my implementation was correct and, if so, if this was the "tearing" you were talking about? — J.T. Freeman, Aug 30 '23 at 02:08
No, that's not video tearing. You are searching your ring buffer for zero crossing without taking into account the head or tail position. That's going to land you anywhere within the waveform's history, perhaps even on a boundary between the newest and oldest outputs. — paddy, Aug 30 '23 at 02:44
@paddy I'm trying to wrap my head around this but I'm extremely confused. I don't understand how, or why to be honest, to take the head/tail position into account when looking for a zero-crossing. If the ring buffer is being filled with samples that correspond to a sine wave's amplitude over time, shouldn't I be okay to start from any zero-crossing? Sorry if I'm missing something obvious, this is my first time working with ring buffers. — J.T. Freeman, Sep 01 '23 at 01:08
No, because "over time" old data is going to be overwritten by new data, and the start position of that new data cycles through the entire buffer. If you don't search backward from the most recent _new_ data, then there's every chance you'll find crossings in old data, or a mix of old and new. Imagine no key is pressed for long enough to fill the buffer, then you press a key. What are you going to draw? Example of buffer contents: https://imgur.com/a/GHtAQLd — paddy, Sep 01 '23 at 03:45

Trouble synchronizing real-time audio visualization with generated audio samples

0 Answers0