-3

I would like to do a linear convolution for a signal of length 4000*270, with a kernel of length 16000. The signal is not fixed while the kernel is fixed. This needs to be repeated for many times for my purpose, so I want to improve the speed as soon as possible. I can implement this convolution in either R or C.

At first, I tried doing the convolution in R, but the speed cannot satisfy my need. I tried doing it by iteration and it was too slow. I also tried doing it using FFT, but because both signal and kernel are long, FFT didn't improve the speed a lot.

Then I decided to do convolution iteratively in C. But C seems not to be able to handle such amount of calculation and reported error very often. Even when it works, it is still very slow. I also tried doing fft convolution in C, but the program always shut down.

I found this code from a friend of mine and not sure about the original source. I will delete it if there is a copyright issue.This is the C code I used for doing fft in C, but the program cannot handle the long vector with length 2097152 (the smallest power of 2 greater than or equal to the signal vector length).

#define q    3        /* for 2^3 points */
#define N    2097152        /* N-point FFT, iFFT */

typedef float real;
typedef struct{real Re; real Im;} complex;

#ifndef PI
# define PI    3.14159265358979323846264338327950288
#endif


void fft( complex *v, int n, complex *tmp )
                   {
if(n>1) {            /* otherwise, do nothing and return */
  int k,m;    
  complex z, w, *vo, *ve;
  ve = tmp; 
  vo = tmp+n/2;

  for(k=0; k<n/2; k++) {
          ve[k] = v[2*k];
          vo[k] = v[2*k+1];
      }
  fft( ve, n/2, v );        /* FFT on even-indexed elements of v[] */
  fft( vo, n/2, v );        /* FFT on odd-indexed elements of v[] */
  for(m=0; m<n/2; m++) {
      w.Re = cos(2*PI*m/(double)n);
      w.Im = -sin(2*PI*m/(double)n);
      z.Re = w.Re*vo[m].Re - w.Im*vo[m].Im;    /* Re(w*vo[m]) */
      z.Im = w.Re*vo[m].Im + w.Im*vo[m].Re;    /* Im(w*vo[m]) */
      v[  m  ].Re = ve[m].Re + z.Re;
      v[  m  ].Im = ve[m].Im + z.Im;
      v[m+n/2].Re = ve[m].Re - z.Re;
      v[m+n/2].Im = ve[m].Im - z.Im;
      }
      }
  return;
       }

  void ifft( complex *v, int n, complex *tmp )
          {
    if(n>1) {            /* otherwise, do nothing and return */
           int k,m;    
           complex z, w, *vo, *ve;
           ve = tmp; 
           vo = tmp+n/2;
    for(k=0; k<n/2; k++) {
          ve[k] = v[2*k];
          vo[k] = v[2*k+1];
          }
    ifft( ve, n/2, v );        /* FFT on even-indexed elements of v[] */
    ifft( vo, n/2, v );        /* FFT on odd-indexed elements of v[] */
    for(m=0; m<n/2; m++) {
            w.Re = cos(2*PI*m/(double)n);
            w.Im = sin(2*PI*m/(double)n);
            z.Re = w.Re*vo[m].Re - w.Im*vo[m].Im;    /* Re(w*vo[m]) */
            z.Im = w.Re*vo[m].Im + w.Im*vo[m].Re;    /* Im(w*vo[m]) */
            v[  m  ].Re = ve[m].Re + z.Re;
            v[  m  ].Im = ve[m].Im + z.Im;
            v[m+n/2].Re = ve[m].Re - z.Re;
            v[m+n/2].Im = ve[m].Im - z.Im;
            }
            }
        return;
        }

I found this page talking about long signal convolution https://ccrma.stanford.edu/~jos/sasp/Convolving_Long_Signals.html But I'm not sure how to use the idea in it. Any thoughts would be truly appreciated and I'm ready to provide more information about my question.

Ashley
  • 67
  • 1
  • 7
  • 2
    Up front; *"I will delete it if there's a copyright issue"* is absolutely insufficient these days: unless it is properly scrubbed by StackExchange employees, I can view deleted questions with all of their edit history (and I'm not a moderator). Even if they do scrub it well, realize that the Wayback Machine *does* track [stackoverflow history](https://web.archive.org/web/*/stackoverflow.com), so even if deleted here, it is available "forever" elsewhere. I hope the code is safe, because your assumption of safe deletion is not correct. – r2evans Jan 08 '19 at 16:03
  • Since your original problem is how to calculate the convolution of two relatively large pieces of data, namely the `kernel` and the `signal`, I would think you need a DSP solution of sorts. I think your question belongs better here on [dsp exchange](http://dsp.stackexchange.com). Just off the top of my head, you could probably do a piece-wise multiplication of the FFT domain `signal`&`kernel` but this would require some overlapping (if I remember my Masters courses correctly now). Like I said, this is more DSP than programming, also you could `downsample` your signal but will lose resolution. – Duck Dodgers Jan 08 '19 at 16:22
  • Have you seen this link : https://software.intel.com/en-us/ipp-dev-reference-fir-filter-functions ? – Stef1611 Jan 08 '19 at 16:22
  • 1
    @r2evans Thanks for pointing that out. I will pay more attention on this issue in the future. – Ashley Jan 08 '19 at 16:31
  • @JoeyMallone Thanks for your response. Yes I agree with you, this is more DSP than programming. I will check how to do the piece-wise multiplication. – Ashley Jan 08 '19 at 16:32
  • @Stef1611 Thanks I'll check this out. – Ashley Jan 08 '19 at 16:34
  • @Qingfang. Do you have a high data flow rate (>tens of MHz ) ? Perhaps, you could have a look at this link : https://arxiv.org/pdf/1610.03360.pdf – Stef1611 Jan 08 '19 at 16:53
  • Using the FFT is the right way to do a convolution with a larger kernel. Anything else will be slower. – Cris Luengo Jan 08 '19 at 17:55
  • 2
    "But C seems not to be able to handle such amount of calculation and reported error very often. Even when it works, it is still very slow." This is a problem with your coding, not with C. Also, you should use a proper FFT library, like FFTW, which should be orders or magnitude faster than the code you posted. – Cris Luengo Jan 08 '19 at 17:55
  • @CrisLuengo I'm using FFTW now, but there is "segmentation fault" error when I try a long vector of signal, in my case, 270*4000. Is it possible to solve this issue? Thanks. – Ashley Jan 08 '19 at 20:52
  • 2
    @QingfangLiu: I'm sure FFTW can handle signals of that length. If you get a segmentation fault you're likely passing the wrong information to the FFTW function, or using a pointer that doesn't point to the right location. – Cris Luengo Jan 08 '19 at 21:29

1 Answers1

0

The most common efficient long FIR filter method is to use FFT/IFFT overlap-add (or overlap-save) fast convolution, as per the CCRMA paper you referenced. Just chop your data into shorter blocks more suitable for your FFT library and processor data cache sizes, zero-pad by at least the filter kernel length, FFT filter, and sequentially overlap-add the remainder/tails after each IFFT.

Huge long FFTs will most likely trash your processor's caches, which will likely dominate over any algorithmic O(NlogN) speedup.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153