FFT of 16 bit short (real) strided data

Question

I need to compute the FFT of 2 arrays of short data stored like this (replicated millons of times): and so on.

The array values are represented by yellow and blue. Every K values there is a size K space of unused data I need to skip.

I've reordered (and float-casted) the data to get rid of the useless values and used FFTW (with c) to compute the desired transform.

I have two questions regarding this process.

Is there a way to use the in, istride and idist parameters(as seen in the documentation) to use the array as is (without reordering, only casted to float )? I think there's not but maybe I'm burned out.
Is there any good library that could calculate the fft using short data as input? I'll be happy trading some precision for significant speedup. This code will run on an i7.

It sounds like the FFTW "inembed" parameter can be used to skip unused data, but I don't see anything in there about changing the input data type. — zwol, Jun 02 '16 at 16:05

mtrw · Answer 1 · 2016-06-02T21:36:52.640

I am a very satisfied user of KISS FFT by Mark Borgerding. It has an integer transform, and it also has a real-only transform that leverages the Hermitian symmetry inherent in the transform of an exclusively real-valued input to reduce the number of computations required.

However, you will need to keep track of the data locations yourself. The interface is extremely simple and just takes a pointer to an internal data buffer, and input and output buffer addresses.

Assuming your inputs are real-valued, you could do something like:

//Must compile with -DFIXED_POINT=16 directive,
//to tell kiss_fft to do fixed-point transforms with short int data

#include "kiss_fft.h"
#include "kiss_fftr.h"

const size_t K = ...;
const size_t inSize = K/6; //6 based on your diagram above, adjust as needed

kiss_fft_scalar *inBuf = bigInputBuffer;
kiss_fft_cpx outBuf[1+inSize/2]; //Hermitian symmetry, so DC + N/2 complex outputs
//kiss_fft_cpx outBuf[inSize]; //if not using the real transform
size_t ctr = 0;

kiss_fftr_cfg fftCfg = kiss_fftr_alloc(inSize, false, NULL, NULL);
//kiss_fft_cfg fftCfg = kiss_fft_alloc(inSize, false, NULL, NULL); //if not using the real transform

do {
  kiss_fftr(fftCfg, inBuf, outBuf); //kiss_fft(...) is not using the real transform
  //do something with outBuf
  ++ctr;
  if (ctr == 6) {
    inBuf += K;
    ctr = 0;
  } else {
    inBuf += K/6;
  }
} while ((inBuf - bigInputBuffer) < bigBufSize);

Caveats:

You need to recompile to change the data type. Use FIXED_POINT=16 or FIXED_POINT=32 for short or int data, or don't set FIXED_POINT to do floating-point transforms.
FIXED_POINT must have the same value in your source, kiss_fft.* and kiss_fftr.*. So either set it by hand in all files or (better) use -D to pass it as a compiler directive e.g., clang -DFIXED_POINT=16 kiss_fft.c kiss_fftr.c main.c
The real transform expects the input buffer size to be even.
The transform expects the input buffer to have a size that is a product of 2,3,5. I don't remember if it works with other sizes, but even if it does it will be slow.

Thanks! It seems the image was a bit misleading. Both arrays have several millons of values and I need the fft size to be as large as them. The values of the first array are interleaved with the second array values forming chunks as shown. So if I have n chunks (of size K), each array has n*K/2 values. Anyway, thanks for your answer. I'll try it asap.- — papanoel87, Jun 02 '16 at 17:31
Your answer crashes with big (200k) arrays. See if you can run this sample: /// int inSize=262144; kiss_fft_scalar* inBuf = malloc(inSize * sizeof(short)); kiss_fft_cpx* outBuf= malloc(inSize * sizeof(kiss_fft_cpx)); kiss_fftr_cfg fftCfg = kiss_fftr_alloc(inSize, 0, NULL, NULL); int i=0; for(i=0;i — papanoel87, Jun 02 '16 at 19:38
@papanoel87: A size of 262144 works for me. See updated answer, I did not explain the importance of FIXED_POINT very well. — mtrw, Jun 02 '16 at 21:37
It worked as you said. Thanks again. I have another problem now, tho. Check this out: const size_t inSize = 16; kiss_ fft_scalar * inBuf = malloc(inSize * sizeof(short)); kiss_ fft_ cpx outBuf[1+inSize/2]; int i; for (i=0;i — papanoel87, Jun 10 '16 at 17:55
@papanoel87 - yeah, it's rounding. In a fixed point FFT, think of 32767 as equal to your input signal's max value, and 1 as 1.0/32767.0. If you fill your input buffer with 32767 you should see a non-zero output. — mtrw, Jun 10 '16 at 21:22

FFT of 16 bit short (real) strided data

1 Answers1