1

I have a complex data of size 1024*128*20. I need to find 1024-point FFT of for 128*20 blocks. I am planning to use Intel MKL or Intel IPP for finding the same. Is it possible to parallelise the code using Intel MKL or IPP? Which one, MKL or IPP, will be better in terms of minimum computation time?

Kara
  • 6,115
  • 16
  • 50
  • 57
user4661268
  • 49
  • 1
  • 7
  • Too broad, really. You can't answer this for all CPU's, and if you have one particular CPU in mind you'd just run the test. – MSalters May 03 '16 at 08:17

3 Answers3

3

I suggest you read: https://software.intel.com/en-us/articles/mkl-ipp-choosing-an-fft/ it provides a good comparison which will make it easier to decide which is better for your use case.

Both the IPP and MKL can do the job, but which has less computation time may depend on your hardware as they are optimized differently, e.g. IPP only works with power of 2 size arrays for FFT, while MKL may be more versatile (according to the article).

(Sorry for bumping an "old" question, but an answer hasn't been chosen and the question is still relevant)

0

I think they have same performance as they are both developed by Intel. I would prefer MKL as it have more users.

Both the MKL and IPP have parallel FFT support. However I would suggest you utilize the parallelism on higher level as you have many blocks of FFT to do. For each 1024-FFT, you could use the sequential version in MKL.

kangshiyin
  • 9,681
  • 1
  • 17
  • 29
0

Intel suggests a solution for multiple FFTs with the same parameters: https://www.intel.com/content/www/us/en/develop/documentation/onemkl-developer-reference-c/top/fourier-transform-functions/fft-functions/configuration-settings/dfti-number-of-transforms.html

The point is that you feed it the whole dataset, and it takes care of the parallelization.

Beware of conjugate-even symmetry, though.

Here's a minimal example:

#include <mkl.h>
#include <vector>
#include <complex>

int main(void)
{
    int inputLength = 1024;
    int numOfTransforms = 8;

    std::vector<double> inputData(numOfTransforms * inputLength, 0.0);

    std::vector<std::complex<double>> spectrum(inputLength * numOfTransforms);

    // ...
    // This is where you fill your matrix with useful data
    // ...

    file.read(reinterpret_cast<char *>(inputData.data()), sizeof(double) * numOfTransforms * inputLength);
    // At this point, input data contains 8 arrays in one, row-major.

    DFTI_DESCRIPTOR_HANDLE fftHandle;

    // Creating a handle with double precision, real input, and along 1st dimension of length inputLength
    auto status = DftiCreateDescriptor(&fftHandle, DFTI_DOUBLE, DFTI_REAL, 1, inputLength);

    status = DftiSetValue(fftHandle, DFTI_NUMBER_OF_TRANSFORMS, numOfTransforms); // nu
    status = DftiSetValue(fftHandle, DFTI_INPUT_DISTANCE, inputLength);
    status = DftiSetValue(fftHandle, DFTI_OUTPUT_DISTANCE, inputLength);
    status = DftiSetValue(fftHandle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);

    // this is important, as the default option is DFTI_COMPLEX_REAL, which is deprecated.
    status = DftiSetValue(fftHandle, DFTI_CONJUGATE_EVEN_STORAGE, DFTI_COMPLEX_COMPLEX);
    status = DftiCommitDescriptor(fftHandle);

    DftiComputeForward(fftHandle, inputData.data(), spectrum.data());

    DftiFreeDescriptor(&fftHandle);
    return 0;
}
sudoLife
  • 3
  • 4
  • While this link may answer the question, it is better to include the essential parts of the answer here and provide the link for reference. Link-only answers can become invalid if the linked page changes. - [From Review](/review/late-answers/32102626) – Nol4635 Jul 01 '22 at 13:01
  • @Nol4635 fair enough. – sudoLife Jul 02 '22 at 16:09