I want to do the FFT of an audio signal in real time, meaning while the person is speaking in the microphone. I will fetch the data (I do this with portaudio, if it would be easier with wavein I would be happy to use that - if you can tell me how). Next I am using the FFTW library - I know how to perform 1D, 2D (real&complex) FFT, but I am not so sure how to do this, since I would have to do a 3D FFT to get frequency, amplitude (this would determine the color gradient) and time. Or is it just a 2D FFT, and I get amplitude and frequency?
-
1possible duplicate of [Audio spectrum analysis using FFT algorithm in Java](http://stackoverflow.com/questions/6627288/audio-spectrum-analysis-using-fft-algorithm-in-java) – msw Jul 12 '11 at 11:17
-
@rave: it is impolite to keep asking the same question http://stackoverflow.com/questions/6635400/programming-a-spectrogram-using-c – msw Jul 12 '11 at 11:19
-
its not a collage homework, been out of university for about 2months. yes it is very much simpler in matlab but i would like to do it in C. @msw sorry..i am so desperate...been having problems with this for 2 days. I am used to programming in Redhat, but when i switched to windows i had to go through the trouble of learning to use MSVC++ and all its linkings. – Rave Jul 12 '11 at 11:24
-
@Rave: if you would like to do it in C, what have you done so far? show us the code? – Sriram Jul 12 '11 at 11:27
-
@Rave: no offence meant: just that 2 FFT questions popped up together! :) – Mitch Wheat Jul 12 '11 at 11:44
4 Answers
I use a Sliding DFT, which is many times faster than an FFT in the case where you need to do a fourier transform each time a sample arrives in the input buffer.
It's based on the fact that once you have performed a fourier transform for the last N samples, and a new sample arrives, you can "undo" the effect of the oldest sample, and apply the effect of the latest sample, in a single pass through the fourier data! This means that the sliding DFT performance is O(n) compared with O(Log2(n) times n) for the FFT. Also, there's no restriction to powers of two for the buffer size to maintain performance.
The complete test program below compares the sliding DFT with fftw. In my production code I've optimized the below code to unreadibility, to make it three times faster.
#include <complex>
#include <iostream>
#include <time.h>
#include <math_defines.h>
#include <float.h>
#define DO_FFTW // libfftw
#define DO_SDFT
#if defined(DO_FFTW)
#pragma comment( lib, "d:\\projects\\common\\fftw\\libfftw3-3.lib" )
namespace fftw {
#include <fftw/fftw3.h>
}
fftw::fftw_plan plan_fwd;
fftw::fftw_plan plan_inv;
#endif
typedef std::complex<double> complex;
// Buffer size, make it a power of two if you want to improve fftw
const int N = 750;
// input signal
complex in[N];
// frequencies of input signal after ft
// Size increased by one because the optimized sdft code writes data to freqs[N]
complex freqs[N+1];
// output signal after inverse ft of freqs
complex out1[N];
complex out2[N];
// forward coeffs -2 PI e^iw -- normalized (divided by N)
complex coeffs[N];
// inverse coeffs 2 PI e^iw
complex icoeffs[N];
// global index for input and output signals
int idx;
// these are just there to optimize (get rid of index lookups in sdft)
complex oldest_data, newest_data;
//initilaize e-to-the-i-thetas for theta = 0..2PI in increments of 1/N
void init_coeffs()
{
for (int i = 0; i < N; ++i) {
double a = -2.0 * PI * i / double(N);
coeffs[i] = complex(cos(a)/* / N */, sin(a) /* / N */);
}
for (int i = 0; i < N; ++i) {
double a = 2.0 * PI * i / double(N);
icoeffs[i] = complex(cos(a),sin(a));
}
}
// initialize all data buffers
void init()
{
// clear data
for (int i = 0; i < N; ++i)
in[i] = 0;
// seed rand()
srand(857);
init_coeffs();
oldest_data = newest_data = 0.0;
idx = 0;
}
// simulating adding data to circular buffer
void add_data()
{
oldest_data = in[idx];
newest_data = in[idx] = complex(rand() / double(N));
}
// sliding dft
void sdft()
{
complex delta = newest_data - oldest_data;
int ci = 0;
for (int i = 0; i < N; ++i) {
freqs[i] += delta * coeffs[ci];
if ((ci += idx) >= N)
ci -= N;
}
}
// sliding inverse dft
void isdft()
{
complex delta = newest_data - oldest_data;
int ci = 0;
for (int i = 0; i < N; ++i) {
freqs[i] += delta * icoeffs[ci];
if ((ci += idx) >= N)
ci -= N;
}
}
// "textbook" slow dft, nested loops, O(N*N)
void ft()
{
for (int i = 0; i < N; ++i) {
freqs[i] = 0.0;
for (int j = 0; j < N; ++j) {
double a = -2.0 * PI * i * j / double(N);
freqs[i] += in[j] * complex(cos(a),sin(a));
}
}
}
double mag(complex& c)
{
return sqrt(c.real() * c.real() + c.imag() * c.imag());
}
void powr_spectrum(double *powr)
{
for (int i = 0; i < N/2; ++i) {
powr[i] = mag(freqs[i]);
}
}
int main(int argc, char *argv[])
{
const int NSAMPS = N*10;
clock_t start, finish;
#if defined(DO_SDFT)
// ------------------------------ SDFT ---------------------------------------------
init();
start = clock();
for (int i = 0; i < NSAMPS; ++i) {
add_data();
sdft();
// Mess about with freqs[] here
//isdft();
if (++idx == N) idx = 0; // bump global index
if ((i % 1000) == 0)
std::cerr << i << " iters..." << '\r';
}
finish = clock();
std::cout << "SDFT: " << NSAMPS / ((finish-start) / (double)CLOCKS_PER_SEC) << " fts per second." << std::endl;
double powr1[N/2];
powr_spectrum(powr1);
#endif
#if defined(DO_FFTW)
// ------------------------------ FFTW ---------------------------------------------
plan_fwd = fftw::fftw_plan_dft_1d(N, (fftw::fftw_complex *)in, (fftw::fftw_complex *)freqs, FFTW_FORWARD, FFTW_MEASURE);
plan_inv = fftw::fftw_plan_dft_1d(N, (fftw::fftw_complex *)freqs, (fftw::fftw_complex *)out2, FFTW_BACKWARD, FFTW_MEASURE);
init();
start = clock();
for (int i = 0; i < NSAMPS; ++i) {
add_data();
fftw::fftw_execute(plan_fwd);
// mess about with freqs here
//fftw::fftw_execute(plan_inv);
if (++idx == N) idx = 0; // bump global index
if ((i % 1000) == 0)
std::cerr << i << " iters..." << '\r';
}
// normalize fftw's output
for (int j = 0; j < N; ++j)
out2[j] /= N;
finish = clock();
std::cout << "FFTW: " << NSAMPS / ((finish-start) / (double)CLOCKS_PER_SEC) << " fts per second." << std::endl;
fftw::fftw_destroy_plan(plan_fwd);
fftw::fftw_destroy_plan(plan_inv);
double powr2[N/2];
powr_spectrum(powr2);
#endif
#if defined(DO_SDFT) && defined(DO_FFTW)
// ------------------------------ ---------------------------------------------
const double MAX_PERMISSIBLE_DIFF = 1e-11; // DBL_EPSILON;
double diff;
// check my ft gives same power spectrum as FFTW
for (int i = 0; i < N/2; ++i)
if ( (diff = abs(powr1[i] - powr2[i])) > MAX_PERMISSIBLE_DIFF)
printf("Values differ by more than %g at index %d. Diff = %g\n", MAX_PERMISSIBLE_DIFF, i, diff);
#endif
return 0;
}

- 3,151
- 24
- 25
-
1Hi Josh, this optimization looks extremely promising! My scenario involves weighting the input data by a hanning window before each FFT (i.e. I'm not using a rectangular window as your code seems to be doing), do you happen to know if there's a variant of your algorithm that could work with non-rectangular windowed data? Thank you very much in advance! =) – May 29 '12 at 22:36
-
2Haven't done this. But `Hamming(idx) = 0.54 - 0.46 * cos(2PI * idx / (N-1))`. So it, like the FT itself, is amenable to be run stepwise as idx sweeps acrosss the buffer. The line to change is `complex delta = newest_data - oldest_data;` - instead of taking the difference between the oldest and newest data values, the hamming function would need to be applied to both values prior to taking the difference. When I have time, I'll try to modify my code to do this. **EDIT:** You said hanning -- do you mean Hann or Hamming? Hann is `Hann(n) = 0.5 - 0.5 * cos(2PI * n / (N-1))` – Josh Greifer Jun 01 '12 at 12:29
-
Hi Josh, thanks for your prompt reply! You are correct, I'm using a Hann window (I went with the flow and used the popular, yet incorrect, hanning window name, sorry). My doubt regarding your suggested method is, since we are reusing computations from previous iterations, won't we somehow have to visit all the previous samples to introduce the effect of the sliding Hann window? Inspired by your response I reseached a bit, and I see some people are suggesting a convolution on the frequency domain to patch up the response as if it had been Hann-windowed. I'll keep you posted. Thanks again! – Jun 01 '12 at 15:52
-
Yes that sounds right - I can't really figure these things out until I do a dry run of the cumulative changes introduced by the windowing function as it sweeps across the values -- I guess it would probably be possible by using an intermediary buffer, still allowing the whole algorithm to be O(n). – Josh Greifer Jun 01 '12 at 17:16
-
Shouldn't the line in `sdft()` doing the actual computation be `freqs[i] = (freqs[i] + delta) * coeffs[i];` according to [this (sec. 2.3)](http://www.comm.toronto.edu/~dimitris/ece431/slidingdft.pdf)? I mean, you are using it in production and so it is most probably correct of course, but would you mind adding the mathematical derivation? – Sebastian Graf Dec 06 '14 at 13:13
-
1Sebastian, it's been a while since I looked at this code, which I derived "by hand" from first principles. I didn't know at the time that I'd reinvented the SDFT until after I'd implemented this. In fact I was a little disappointed! I'm somewhat puzzled myself by the curious indexing: I think the it's a by-product of my using circular buffers, bumping global pointers and of avoiding any modulo arithmetic in the code. So I guess the maths is textbook, the indexing is not. – Josh Greifer Dec 08 '14 at 15:07
-
2@Sebastian: I was wondering the exact same thing as well. (ie. why isn't `freqs[i]` being multiplied, as it should be if you look at the math) As Josh already said, this is because of the circular buffer, but I'll clarify that a bit: The code isn't calculating the DFT of a signal as directly seen from a window, but one that's also rotated by `i % N` steps. Ie. for `[1, 2, 3, 4, 5, 6, 7, 8, 9]` and `N=5`, when `i=2` the DFT is calculated for `[6, 7, 3, 4, 5]` instead of `[3, 4, 5, 6, 7]`. When rotated like this, all numbers except one stay the same and `freqs[i]` itself isn't multiplied. – Aleksi Torhamo Feb 07 '15 at 01:32
-
1
-
@Mujjingun it's a leftover from a bit of code that used to be there :) – Josh Greifer Dec 14 '15 at 16:13
-
2
If you need amplitude, frequency and time in one graph, then the transform is known as a Time-Frequency decomposition. The most popular one is called the Short Time Fourier Transform. It works as follows:
1. Take a small portion of the signal (say 1 second)
2. Window it with a small window (say 5 ms)
3. Compute the 1D fourier transform of the windowed signal.
4. Move the window by a small amount (2.5 ms)
5. Repeat above steps until end of signal.
6. All of this data is entered into a matrix that is then used to create the kind of 3D representation of the signal that shows its decomposition along frequency, amplitude and time.
The length of the window will decide the resolution you are able to obtain in frequency and time domains. Check here for more details on STFT and search for "Robi Polikar"'s tutorials on wavelet transforms for a layman's introduction to the above.
Edit 1:
You take a windowing function (there are innumerable functions out there - here is a list. Most intuitive is a rectangular window but the most commonly used are the Hamming/Hanning window functions. You can follow the steps below if you have a paper-pencil in hand and draw it along.
Assume that the signal that you have obtained is 1 sec long and is named x[n]
. The windowing function is 5 msec long and is named w[n]
. Place the window at the start of the signal (so the end of the window coincides with the 5ms point of the signal) and multiply the x[n]
and w[n]
like so:
y[n] = x[n] * w[n]
- point by point multiplication of the signals.
Take an FFT of y[n]
.
Then you shift the window by a small amount (say 2.5 msec). So now the window stretches from 2.5ms to 7.5 ms of the signal x[n]
. Repeat the multiplication and FFT generation steps. In other words, you have an overlap of 2.5 msec. You will see that changing the length of the window and the overlap gives you different resolutions on the time and Frequency axis.
Once you do this, you need to feed all the data into a matrix and then have it displayed. The overlap is for minimising the errors that might arise at boundaries and also to get more consistent measurements over such short time frames.
P.S: If you had understood STFT and other time-frequency decompositions of a signal, then you should have had no problems with steps 2 and 4. That you have not understood the above mentioned steps makes me feel like you should revisit time-frequency decompositions also.

- 10,298
- 21
- 83
- 136
-
hey, i understand the theory behind STFT, but i a bit confused on how would you do it in C, what do you mean window it with a a small window, and step 4, move the window ? i don't understand step 2 and 4, if you can can you explain a bit thank you – Rave Jul 12 '11 at 11:50
-
-
hi Sriram, thank you, i found this library, called siglib:http://www.numerix-dsp.com/free/. which has functions for windows such as Hamming and rectangle, etc. also if anyone have any prev experience with siglib, i would love to hear it – Rave Jul 13 '11 at 05:56
-
I am afraid I do not have any experience with siglib. But you are most welcome to try some code out and ask questions on it. Also, if you found this answer helpful, you might want to mark it as the correct answer. – Sriram Jul 13 '11 at 08:31
-
@Sriram - Can the window not just be calculated on the buffer elements of each audio callback? – some_id Jul 01 '13 at 20:52
-
@Helium3: You could do that, but if the number of buffer elements varies each time, then we are looking at a `wavelet transform`. You may need to use the CWT or DWT in that case. – Sriram Jul 03 '13 at 19:23
You can create a realtime FFT by choosing a short time-span and analysing (FFT'ing) just that time-span. You can probably get away with just selecting non-overlapping timespans of say 100-500 milliseconds; the analytically purer way to do this would be using a sliding-window (again of e.g. 100-500 ms), but that is often unnecessary and you can show nice graphics with the non-overlapping timespans without much processing power.

- 23,242
- 4
- 37
- 66
Real-time FFT means completely different from what you just described. It means that for given N and X[N] your algorithm gives Fx[i] while incrementing value i. Meaning, proceeding value does not compute until current value computation completed. This is completely different from what you just described.
Hardware usually uses FFT with around 1k-16k points. Fixed N, not real-time computation. Moving window FFT as described with previous answers.

- 139
- 6