I recently downloaded the newest scikit for work with FFTs. However, I have run into a problem. I have data size and window size of 2^19. The size of array going into the fft function is 524288, which is far below the 2^27 element limit listed in the documentation.
multiply_them = ElementwiseKernel(
"float *dest, float *a, float *b",
#{
#const int i = blockIdx.x +threadIdx.x;
"dest[i] = a[i] * b[i]",
#}
"linear_combination")
#multiplythem = mod.get_function("multiply_them")
gval1 = gpuarray.to_gpu(val1.astype(numpy.float32)) #gval1 = input * rescale * gain
gwindow = gpuarray.to_gpu(window.astype(numpy.float32)) #gwindow = filtering window
gval2 = gpuarray.to_gpu(numpy.zeros_like(gval1.get()))#.astype(np.float32)) #set up zero array
#val2 = numpy.zeros_like(val1).astype(numpy.float32)
multiply_them(gval2, gval1, gwindow) # block=(max_block_dim,1,1), grid=(grid_dim,1)) #gval2 = gval1 .* gwindow
val1 = gval2.get() #retrieve val1 from GPU
#gval1 = fft(gval1,fft_window_size);
#gval1 = fftshift(gval1,1);
#gval1 = abs(gval1);
gval1 = gpuarray.to_gpu(val1)
gval2 = gpuarray.to_gpu(numpy.empty(fft_window_size, numpy.complex64))
plan_forward = cu_fft.Plan(gval1.shape[0]*2, numpy.float32, numpy.complex64)
cu_fft.fft(gval1, gval2, plan_forward)
#val2 = scipy.fftpack.fft(val1,fft_window_size)
val1 = gval2.get()
Yet, when I run the code and check it with MATLAB and Scipy's FFT functions, the values trail off to zero half-way through the computations. I can't figure out how to increase the batch size and still have correct numbers. Some advice would be nice.