1

I'm performing cross-correlation between a shorter clip of audio (44100 * 14 samples) and a much longer clip of audio (44100 * 60 * 6 samples). From what I understand, I can't window the FFT because of this. When testing out kiss_fftr and kiss_fftri, I found that the inverse operation returned largely noise (but it was still rhythmically similar to the input). I've confirmed that my input audio is correct and the corruption happens solely within this function:

static std::vector<std::vector<float>> do_fft(std::vector<std::vector<float>> song, std::vector<std::vector<float>> loop)
{
    loop[0].resize(kiss_fftr_next_fast_size_real(loop[0].size())); // TODO: resize this to song size instead of loop size when done testing
    loop[1].resize(loop[0].size()); // TODO: make this dynamic

    std::vector<std::vector<kiss_fft_cpx>> fft_loop;
    std::vector<std::vector<float>> output;

    for (int chan = 0; chan < loop.size(); chan++)
    {
        fft_loop.push_back(std::vector<kiss_fft_cpx>());
        fft_loop[chan].resize(loop[chan].size());

        output.push_back(std::vector<float>());
        output[chan].resize(loop[chan].size()); // TODO: resize this to song size instead of loop size when done testing
    }

    kiss_fftr_cfg cfg_loop = kiss_fftr_alloc(loop[0].size(), 0, NULL, NULL);
    kiss_fftr(cfg_loop, &loop[0][0], &fft_loop[0][0]);
    kiss_fft_free(cfg_loop);

    kiss_fftr_cfg cfgi_loop = kiss_fftr_alloc(fft_loop[0].size(), 1, NULL, NULL);
    kiss_fftri(cfgi_loop, &fft_loop[0][0], &output[0][0]);
    kiss_fft_free(cfgi_loop);

    return output;
}

Here's what the output looks like compared to the input: Enlarged to show detail:

If you're wondering about memory, the program is 64-bit and only uses a few gigabytes of ram (just a few gigs, nothing major :P)

Ott
  • 57
  • 8
  • 1
    The "silent" bit in the input seems pretty amplified. Seems like a scaling problem. That's called out in the kissfft readme, and it inconveniently depends on the value type used inside kissfft. – MSalters Aug 07 '18 at 13:04

1 Answers1

4

Different FFT libraries use different scaling factors, and/or distribute scaling factors differently between their FFT and IFFT implementations.

kiss_fft requires you to scale down by the length of the fft during, or between an fft/ifft pair to get back (approximately within numeric or rounding error) the original time domain input vector.

In your case, that's a fairly large scale factor because the length of your data in large.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153
  • Thanks! I can't believe I didn't notice that the output was only present during zero-crossings. – Ott Aug 07 '18 at 18:27
  • Hi Ronald! I would like to pick your brain in another problem (again). It's a question related to **kissfft**. Do you mind taking a look at it? [Here is the link](https://stackoverflow.com/q/61872422/176769). – karlphillip May 18 '20 at 15:26