-1

I am trying to visualize a spectrum where the frequency range is divided into N bars, either linearly or logarithmic. The FFT seems to work fine, but I am not sure how to interpret the values in order to decide the max height for the visualization. I am using FMODAudio, a wrapper for C#. It's set up correctly.

In the case of a linear spectrum, the bars are defined as following:

public int InitializeSpectrum(int windowSize = 1024, int maxBars = 16)
{
    numSamplesPerBar_Linear.Clear();
    int barSamples = (windowSize / 2) / maxBars;

    for (int i = 0; i < maxBars; ++i)
    {
        numSamplesPerBar_Linear.Add(barSamples);
    }
    IsInitialized = true;
    Data = new float[numSamplesPerBar_Linear.Count];
    return numSamplesPerBar_Linear.Count;
}

Data is the array which holds the spectrum values received from the update loop.

The update looks like this:

public unsafe void UpdateSpectrum(ref ParameterFFT* fftData)
{
    int length = fftData->Length / 2;
    if (length > 0)
    {
        int indexFFT = 0;
        for (int index = 0; index < numSamplesPerBar_Linear.Count; ++index)
        {
            for (int frec = 0; frec < numSamplesPerBar_Linear[index]; ++frec)
            {
                for (int channel = 0; channel < fftData->ChannelCount; ++channel)
                {
                    var floatspectrum = fftData->GetSpectrum(channel); //this is a readonlyspan<float> by default.
                    Data[index] += floatspectrum[indexFFT];
                }
                ++indexFFT;
            }

            Data[index] /= (float)(numSamplesPerBar_Linear[index] * fftData->ChannelCount); // average of both channels for more meaningful values.
        }       
    }
}

The values I get when testing a song are very low across the bands. A randomly chosen moment when playing a song gives these values: 16 bars = 0,0326 0,0031 0,001 0,0003 0,0004 0,0003 0,0001 0,0002 0,0001 0,0001 0,0001 0 0 0 0 0

I realize it's more useful to use a logarithmic spectrum in many cases, and I intend to, but I still need to figure how how to find the max values for each bar so that I can setup the visualization on a proper scale.

Q: How can I know the potential max values for each bar based on this setup (it's not 1.0)?

Alx
  • 651
  • 1
  • 9
  • 26

2 Answers2

0

output from FFT call is an array where each element is a complex number ( A + Bi ) where A is the real number component and B the imaginary number component ... element zero of this array represents frequency zero as in DC which is the offset bias can typically be ignored ... as you iterate across each element of this array you increment the frequency ... this freq increment is calculated using

Audio_samples  <-- array of raw audio samples in PCM format which gets
                   fed into FFT call

num_fft_bins := float64(len(Audio_samples)) / 2.0 //  using Nyquist theorem

freq_incr_per_bin := (input_audio_sample_rate / 2.0) / num_fft_bins

so to answer your question the output array from FFT call is a linear progression evenly spaced based in above freq increment constant

Scott Stensland
  • 26,870
  • 12
  • 93
  • 104
0

Depends on your input data to the FFT, and the scaling that your particular FFT implementation uses (not all FFTs use the same scale factor).

With an energy preserving forward-FFT, Parseval's theorem applies. So the energy (sum of squares) of the input vector equals the energy of the FFT result vector. Note that for a single integer periodic in aperture sinusoidal input (a pure tone), all that energy can appear in a single FFT result element. So if you know the maximum possible input energy, you can use that to compute the maximum possible result element magnitude for scaling purposes.

The range is often large enough that visualizers commonly need to use log scaling, or else typical input can get pixel quantized to a graph of all zeros.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153