Questions tagged [cufft]

cuFFT is a FFT library for CUDA enabled GPUs. Capabilities are similar to the FFTW library.

cuFFT is a FFT library for CUDA enabled GPUs. cuFFT provides functions to do various kinds of forward and reverse Fast Fourier Transforms including multidimensional transforms and batched transforms.

146 questions
1
vote
1 answer

CUDA cufft library 2D FFT only the left half plane correct

I am doing 2D FFT on 128 images of size 128 x 128 using CUFFT library. The way I used the library is the following: unsigned int nx = 128; unsigned int ny = 128; unsigned int nz = 128; // Make 2D fft batch plan int n[2] = {nx, ny}; int inembed[] =…
Da Teng
  • 551
  • 4
  • 21
1
vote
1 answer

Strategy - CUFFT computing 2D FFT on many images

I am using CUFFT for 2D FFT on 128 images. Each of the image is of size 128 x 128. On MATLAB, doing one 2D FFT takes 0.3 ms, and to do FFT on all 128 images takes pretty much 128 times of that number of ms. Using CUFFT, the execution of the…
Da Teng
  • 551
  • 4
  • 21
1
vote
0 answers

How do I fix an argument error in an fft function that uses skcuda.cufft?

I want to make a python-wrapped GPU fft function that can compute the transforms of arbitrary sized inputs using scikits-cuda.cufft. (I tried PyFFT which only takes powers of 2) I modeled my skcuda.cufft code from the CUDA code: __host__…
1
vote
1 answer

Applying cuFFT to OpenGL Vertex Buffer Objects

So the cufftComplex type is an array with n structs with an x and a y-field, respectively representing the real and the imaginary parts of each complex number. On the other hand, if I want to create a vertex buffer object in OpenGL with an x- and…
Jan M.
  • 489
  • 2
  • 5
  • 21
1
vote
1 answer

Why cuFFT is "slow" on K40?

I've compared a simple 3D cuFFT program on both a GTX 780 and a Tesla K40 in double precision mode. On the GTX 780 I measured about 85 Gflops, while on the K40 I measured about 160 Gflops. These results baffled me: the GTX 780 ha 166 Gflops of peak…
JohnWil
  • 43
  • 4
1
vote
1 answer

cuFFT wrong results only when starting from complex

I was helped before in this answer to realise an in-place transform and it works well but ONLY if I start with real data. If I start with complex data, the results after IFT+FFT are wrong, and this happens only in the in-place version, I have…
JohnWil
  • 43
  • 4
1
vote
1 answer

Wrong results cufft 3D in-place

I write because I'm facing problems with the cufft 3D transform in-place, while I have no problems for the out-of-place version. I tried to follow Robert Crovella's answer here but I'm not obtaining the correct results when I make a FFT+IFT. This is…
JohnWil
  • 43
  • 4
1
vote
1 answer

Why cufftPlanMany() takes too long?

When calling cufftPlanMany() the first time, it takes about 0.7 sec, but all next calls are fast. Any idea how to accelerate the first call of cufftPlanMany()?
Maghraby
  • 11
  • 3
1
vote
1 answer

How to view CUDA library function calls in profiler?

I am using the cuFFT library. How do I modify my code to see the function calls from this library (or any other CUDA library) in the NVIDIA Visual Profiler NVVP? I am using Windows and Visual Studio 2013. Below is my code. I convert my image and…
user8919
  • 67
  • 2
  • 9
1
vote
1 answer

CUFFT is 1000x slower in VS2013/Cuda7.0 compared to VS2010/Cuda4.2

This simple CUFFT code was run on two IDEs - VS 2013 with Cuda 7.0 VS 2010 with Cuda 4.2 I found that VS 2013 with Cuda 7.0 was a 1000 times slower approximately. The code executed in 0.6 ms in VS 2010, and took 520 ms on VS 2013, both on an…
The Vivandiere
  • 3,059
  • 3
  • 28
  • 50
1
vote
1 answer

CUDA cuFFT Undefined symbols for architecture x86_64

I'm trying to use cuFFT library but when I compile my project I have the error: Undefined symbols for architecture x86_64: "_cufftDestroy" ... "_cufftExecC2C" ... "_cufftPlan1d" ... ld: symbol(s) not found for architecture x86_64 clang: error:…
mary
  • 305
  • 3
  • 12
1
vote
1 answer

CUDA FFT plan reuse across multiple 'overlapped' CUDA Stream launches

I'm in trying to improve the performance of my code using asynchronous memory transfer overlapped with GPU computation. Formerly I had a code where I created an FFT plan, and then make use of it multiple times. In such situation the time invested…
1
vote
0 answers

Compute several FFT with GPU using Python multiprocessing and pyfft: how to avoid GPU memory leak?

I am trying to implement in Python the following pattern for multi-CPU and single-GPU computation using pycuda and pyfft packages. I would like to have several processes (e.g. launched with multiprocessing.Pool()), with each of them able to perform…
mtazzari
  • 451
  • 1
  • 5
  • 14
1
vote
1 answer

How can I get the full fft coefficients by cufft?

I am doing two dimensional fft process by cufft. Processing type is real to complex, so the size of out array is NX * (NY / 2 + 1) which is non redundant. But I need the full coefficients containing the redundant ones. How can i get them all? Thanks…
Wang Wang
  • 115
  • 1
  • 9
1
vote
1 answer

Batched FFTs using cufftPlanMany

I want to perform 441 2D, 32-by-32 FFTs using the batched method provided by the cuFFT library. The parameters of the transform are the following: int n[2] = {32,32}; int inembed[] = {32,32}; int onembed[] =…
Teller
  • 175
  • 2
  • 9