How to syncronize audio with the power spectrum and choose frame length N (to do fft)?

Question

I am doing a music visualizer program in C++. It gives the frequency spectrum of the audio input. I used Aquila-dsp for getting audio samples, Kiss-fft for doing FFT, and SMFL to play the audio. The input is in (.wav) format. OpenGL is used to plot the graph.

Algorithm Used:

1. *framePointer = 0, N = 10000;*
2. Load audio file and play it using SFML.
3. For *i* = framePointer to --> *framePointer* + *N* <  *total_samples_count*

Collect audio samples.

4. Apply Window Function (Hann window) 
5. Apply *FFT*
6. Calculate magnitude of first N/2 *FFT* data

 *Magnitude* = sqrt( re * re + im * im)

7. Convert to dB(log) scale (optional)

  10*log(magnitude)

8. Plot N/2, log(magnitude) values
9. If *framaPointer* >= *toatl_samples_count - N* 

Exit

Else go to step 3.

#define N 10000
int framePointer = 0;

void getData()
{

int i,j,x;
Aquila::WaveFile wav(fileName);
double mag[N/2];

double roof = wav.getSamplesCount();

//Get first N samples
for( i = framePointer, j = 0; i < (framePointer + N)
                                     && framePointer < roof - N ; i++,j++  ){

    //Apply window function on the sample
    double multiplier = 0.5 * (1 - cos(2*M_PI*j/(N-1)));
    in[j].r = multiplier * wav.sample(i);
    in[j].i = 0;  //stores N samples 
}


if(framePointer < roof-N -1){
    framePointer = i;

}
else {
    printf("Frame pointer > roof - N \n");
    printf("Framepointer = %d\n",framePointer );

    //get total time and exit
    timestamp_t t1 = get_timestamp(); 
    double secs = (t1 - tmain) / 1000000.0L; 
    std::cout<<"Program exit.\nTotal time: "<<secs<<std::endl;
    exit(0);
}

// Apply FFT 
getFft(in,out);

// calculate magnitude of first N/2 FFT
for(i = 0; i < N/2; i++ ){
    mag[i] = sqrt((out[i].r * out[i].r) + (out[i].i * out[i].i));
    graph[i] = log(mag[i]) *10; 
}
}

I plot the graph using OpenGL. Full source code

The problem I got is in choosing the frame length (N value).

For a certain length of audio having:

Length: 237191 ms
Sample frequency: 44100 Hz
Channels: 2
Byte rate: 172 kB/s
Bits per sample: 16b

The graph is synchronized with the audio if I choose N = 10000. Or at least it is stopping while the audio ends.

How to chose the N (frame length) such that the audio will be synchronized with the spectrum. The audio is dual channel, will this algorithm work for that?

Just shooting from the hip, but why not use a rolling window, so the transform window moves forward at the same rate as the audio sample stream? A an optimization may be to re-use computations that 'overlap', i.e. cache computations you did last time, before moving the window. — Erik Alapää, May 24 '16 at 13:42
IMO, using `glutIdleFunc()` is not 'easy' in this sort of synchronizing. `glutTimerFunc()` could perhaps be a better choice. Another possibility would be, if your code is using a callback for generating spectra, to call `glutPostRedisplay()` at the end of each callback's render cycle, to start with. — user3078414, May 24 '16 at 14:03
@user3078414 I am not very good at OpenGL. I have used this graph from `https://en.wikibooks.org/wiki/OpenGL_Programming/Scientific_OpenGL_Tutorial_02` to plot the spectrum. — Indra, May 24 '16 at 15:02

score 1 · Accepted Answer · answered May 24 '16 at 13:46

1

Start by deciding how often you want the visualizer to update. Let's say we want it to update 25 times per second (similar to TV or movie frame rates). That means every 1 / 25 seconds, or every 40 ms. At a sample rate of 44.1 kHz this translates to 44100 / 25 = 1764 samples. Since we typically want a power of 2 FFT size then let's go for N = 2048.

This gives a resolution in the frequency axis of 44100 / 2048 = 21.5 Hz. If you want higher resolution then you can overlap successive FFT windows, e.g. keeping the same update rate and overlapping by 50% then you can have N = 4096 for a resolution of 10.75 Hz.

answered May 24 '16 at 13:46

Paul R

208,748
37
389
560

When `N 2048` , the music finishes at `Framepointer = 2942976` of total of `10460160 samples`. – Indra May 24 '16 at 14:52
@Indrajith: it sounds like you your code isn't actually synchronizing the visualizer with the audio playback - you need to add some timing code to ensure that the frame you are processing corresponds with the frame that is being played. – Paul R May 24 '16 at 15:14
@Paule I think because of the algorithm complexity if we take N as `2048`. The most it can process is `2942976` frames. How can we increase the processing time. Can we decrease the audio processing time without effecting the music being played? – Indra May 24 '16 at 15:50
I suggest you simplify the task and get one part working at a time - since you're playing audio from a file you first need to **synchronize** your visualizer processing to the playback. Once you have that working then the rest should be easy. – Paul R May 24 '16 at 15:54

How to syncronize audio with the power spectrum and choose frame length N (to do fft)?

1 Answers1