0

I am a computer programmer who work on a Telecommunication project.
In our project I have to change a series of complex number to their Fourier transform.so I need an efficient FFT code for C89 standard.
I am using the following code and it works well enough:

    short FFT(short int dir,long m,double *x,double *y)
{
   long n,i,i1,j,k,i2,l,l1,l2;
   double c1,c2,tx,ty,t1,t2,u1,u2,z;

   /* Calculate the number of points */
   n = 1;
   for (i=0;i<m;i++) 
      n *= 2;

   /* Do the bit reversal */
   i2 = n >> 1;
   j = 0;
   for (i=0;i<n-1;i++) {
      if (i < j) {
         tx = x[i];
         ty = y[i];
         x[i] = x[j];
         y[i] = y[j];
         x[j] = tx;
         y[j] = ty;
      }
      k = i2;
      while (k <= j) {
         j -= k;
         k >>= 1;
      }
      j += k;
   }

   /* Compute the FFT */
   c1 = -1.0; 
   c2 = 0.0;
   l2 = 1;
   for (l=0;l<m;l++) {
      l1 = l2;
      l2 <<= 1;
      u1 = 1.0; 
      u2 = 0.0;
      for (j=0;j<l1;j++) {
         for (i=j;i<n;i+=l2) {
            i1 = i + l1;
            t1 = u1 * x[i1] - u2 * y[i1];
            t2 = u1 * y[i1] + u2 * x[i1];
            x[i1] = x[i] - t1; 
            y[i1] = y[i] - t2;
            x[i] += t1;
            y[i] += t2;
         }
         z =  u1 * c1 - u2 * c2;
         u2 = u1 * c2 + u2 * c1;
         u1 = z;
      }
      c2 = sqrt((1.0 - c1) / 2.0);
      if (dir == 1) 
         c2 = -c2;
      c1 = sqrt((1.0 + c1) / 2.0);
   }

   /* Scaling for forward transform */
   if (dir == 1) {
      for (i=0;i<n;i++) {
         x[i] /= n;
         y[i] /= n;
      }
   }

   return(true);
}

But this code, just support arrays with the size of 2^m.like CLRS book code.
Our arrays which should be transformed, are not in this range and adding zero will be expensive, so I am looking for another solution which help me to have input in any size.
Something like what IT++ and matlab do. but as we want it in pure C using them is impossible.Moreover, IT++ code was blocked as I checked

bahrami307
  • 59
  • 1
  • 9
  • 1
    How about [FFTW](http://www.fftw.org/)? – Carl Norum Aug 20 '13 at 17:51
  • or [KissFFT](http://sourceforge.net/projects/kissfft/) if speed is less of an issue and code size, ease of compilation, and licensing are more of an issue. – Jason B Aug 20 '13 at 18:06
  • I checked both. but I need to have access to the source code for some reasons. and the code is going to be developed in Visual Studio but it will be compiled for MIPS machine. – bahrami307 Aug 20 '13 at 18:54
  • @bahrami307: Both FFTW and Kiss FFT are available in source form. FFTW is available under the GNU General Public License, and Kiss FFT under the BSD License. – Eric Postpischil Aug 20 '13 at 19:18
  • @EricPostpischil: Really? I checked the web page for FFTW and I download a zipped file of version 3.3.3,but I could not find a source. just a .h file was available.and I found the source of Kiss FFT and I am working on, to understand it.thank you anyway. – bahrami307 Aug 20 '13 at 19:51
  • @bahrami307: I suspect you followed the link for the Windows version, which just contains pre-compiled libraries. If you download the files [here](http://www.fftw.org/download.html), you should get the sources. They come with scripts and other files for building them with Unix. Building them with Microsoft tools could be a nuisance. The [page for Windows downloads](http://www.fftw.org/install/windows.html) says they compiled with MinGW. But it has also Visual C++ project files for older versions of FFTW. – Eric Postpischil Aug 20 '13 at 20:12
  • @EricPostpischil: I checked Kiss_fft but I didn`t get the expected answer as I get a FFT and it`s inverse.if the function worked, the input array to FFT block should be same as the output of inverse FFT. I used "kiss_fft_stride" function with parameter -1 and 1 for transform and its reverse. but the result is wrong. – bahrami307 Aug 21 '13 at 08:32

3 Answers3

3

If you are working on any mass-market computing platform (Intel with Windows or OS X, iOS, et cetera), then there is a high-performance FFT implementation supplied by the vendor or manufacturer.

Otherwise, you should evaluate FFTW.

Writing high-performance FFTs for sizes other than powers of two is a complicated task.

If you are going to use your own implementation, then, regarding just the power-of-two sizes:

The implementation you show computes sqrt during the FFT. Most high-performance FFT implementations compute constants ahead of time and store them in a table.

The scaling contains division operations, in x[i] /= n and y[i] /= n. The compiler is likely to implement these as divide instructions. Division is typically a slow instruction on common processors. It would be better to compute scale = 1. / n once and multiply by scale instead of dividing by n.

Better yet would be to omit the scale entirely. The transform is often useful without the scale, or the scale can be omitted from individual transforms and applied just once as an aggregated scale. (E.g., instead of doing two scale operations, one in the forward FFT and one in the inverse FFT, leave the scale operation out of the FFT routines and do it just once after both the forward FFT and the inverse FFT.)

You might be able to omit the bit-reversal permutation, if it is acceptable to have the frequency-domain data in bit-reversed order.

If you keep the bit-reversal permutation, it can be optimized. Techniques for doing this are platform-dependent. Some platforms have an instruction for reversing the bits in an integer (e.g., ARM has rbit). If your platform does not, you might want to keep the bit-reversal indices in a table or investigate ways of computing them more quickly than your current code.

If you keep both the bit-reversal permutation and the scaling, you should consider doing them at the same time. The permutation uses a lot of memory motion, and the scaling uses the processor’s arithmetic unit. Most modern processors can do both of these at once, so you would get some benefit from overlapping operations.

Your current code uses a radix-2 butterfly. Radix-4 is usually better, because it gains some advantage from the fact that multiplication by i can be done just by changing which piece of data is used and changing some additions to subtractions and vice-versa.

If your array length approaches the size of level-one memory cache on your processor, parts of the FFT implementation will thrash cache and slow down significantly unless you design appropriate code to deal with this (often by copying parts of the array into a buffer temporarily).

If your target processor has SIMD features, you should absolutely use those in the FFT; they speed up FFT performance tremendously.

The above should convey to you that writing an efficient FFT is a complicated task. Unless you want to expend significant effort on it, you are likely better off using FFTW or other existing implementations.

Eric Postpischil
  • 195,579
  • 13
  • 168
  • 312
  • Thanks a lot,it was very useful, I will use your offers in optimizing section, but here in this step supporting the input length and order of algorithm is under attention. – bahrami307 Aug 20 '13 at 19:11
0

In your implementation, I'm concerned by this bit of code:

     z =  u1 * c1 - u2 * c2;
     u2 = u1 * c2 + u2 * c1;
     u1 = z;

(u1, u2) is subject to lots and lots of accumulated roundoff error when l1 is large. You will likely get an inaccurate transform.

I'd echo the recommendation of FFTW, but I believe it's highly platform-specific. (Most FFT libraries are.) [EDIT: No, it's actually straight C89. It's exactly what you say you're looking for.]

The Wikipedia FFT page lists a bunch of algorithms that work for weird-sized input arrays. I don't know how they work, but I believe the general idea is that you use Rader's algorithm or Bluestein's algorithm for prime-sized inputs and Cooley-Tukey to reduce composite-sized transforms to a bunch of prime-sized transforms.

tmyklebu
  • 13,915
  • 3
  • 28
  • 57
  • The error in typical FFT implementations has been studied in academic papers and is quite reasonable. FFTW is not platform-specific; it contains a number of different implementations of various pieces of DFT code and measures to determine which of them to use. So it adapts to each platform and provides decent performance (often not as good as code manually tuned for a particular platform by an expert, but still good). – Eric Postpischil Aug 20 '13 at 18:34
  • I checked FFTW.org site but access to the source code was not possible. I download the suggested file but there was no benefit with it for me. Although it has a good manual, but I need to have access to source. Thank you any way – bahrami307 Aug 20 '13 at 18:51
  • @EricPostpischil: The error in an FFT implementation that accumulates the twiddle factor the way his does is much worse, though. Your twiddle factor will typically be off by about n/2 ulps, instead of just a few ulps, at the end because you're computing the n'th power of your approximate root of unity by multiplying 1 by your approximate root of unity n times. (I know you know this; I just don't think you read the code I was referring to.) – tmyklebu Aug 20 '13 at 20:21
  • @tmyklebu: Yes, I agree `c1` and `c2` would be subject to accumulating error, as calculated in the code shown in the question. – Eric Postpischil Aug 20 '13 at 20:36
0

For an alternative to FFTW check out my mix-radix FFT, which also handles non-2^N FFT lengths. The C-source is available from http://www.corix.dk/Mix-FFT/mix-fft.html.

Jens
  • 409
  • 4
  • 11
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. – Umur Kontacı Jan 10 '15 at 23:51
  • The original poster of the question (bahrami307) requested access to source code in a later post and apparently had problems finding it. That's my motivation for posting the above link as an answer. I usually update links in my earlier postings to keep them alive. – Jens Jan 12 '15 at 19:11