2

I am having trouble with high memory usage when performing ffts with scipy's fftpack. Example obtained with the module memory_profiler:

Line #    Mem usage    Increment   Line Contents
================================================
 4   50.555 MiB    0.000 MiB   @profile
 5                             def test():
 6  127.012 MiB   76.457 MiB       a = np.random.random(int(1e7))
 7  432.840 MiB  305.828 MiB       b = fftpack.fft(a)
 8  891.512 MiB  458.672 MiB       c = fftpack.ifft(b)
 9  585.742 MiB -305.770 MiB       del b, c
10  738.629 MiB  152.887 MiB       b = fftpack.fft(a)
11  891.512 MiB  152.883 MiB       c = fftpack.ifft(b)
12  509.293 MiB -382.219 MiB       del a, b, c
13  547.520 MiB   38.227 MiB       a = np.random.random(int(5e6))
14  700.410 MiB  152.891 MiB       b = fftpack.fft(a)
15  929.738 MiB  229.328 MiB       c = fftpack.ifft(b)
16  738.625 MiB -191.113 MiB       del a, b, c
17  784.492 MiB   45.867 MiB       a = np.random.random(int(6e6))
18  967.961 MiB  183.469 MiB       b = fftpack.fft(a)
19 1243.160 MiB  275.199 MiB       c = fftpack.ifft(b)

My attempt at understanding what is going on here:

  1. The amount of memory allocated by both fft and ifft on lines 7 and 8 is more than what they need to allocate to return a result. For the call b = fftpack.fft(a), 305 MiB is allocated. The amount of memory needed for the b array is 16 B/value * 1e7 values = 160 MiB (16 B per value as the code is returning complex128). It seems that fftpack is allocating some type of workspace, and that the workspace is equal in size to the output array (?).

  2. On lines 10 and 11 the same procedure is run again, but the memory usage is less this time, and more in line with what I expect. It therefore seems that fftpack is able to reuse the workspace.

  3. On lines 13-15 and 17-19 ffts with different, smaller input sizes are performed. In both of these cases more memory than what is needed is allocated, and memory does not seem to be reused.

The memory usage reported above agrees with what windows task manager reports (to the accuracy I am able to read those graphs). If I write such a script with larger input sizes, I can make my (windows) computer very slow, indicating that it is swapping.

A second example to illustrate the problem of the memory allocated for workspace:

factor = 4.5
a = np.random.random(int(factor * 3e7))
start = time()
b = fftpack.fft(a)
c = fftpack.ifft(b)
end = time()
print("Elapsed: {:.4g}".format(end - start))
del a, b, c
print("Finished first fft")

a = np.random.random(int(factor * 2e7))
start = time()
b = fftpack.fft(a)
c = fftpack.ifft(b)
end = time()
print("Elapsed: {:.4g}".format(end - start))
del a, b, c
print("Finished first fft")

The code prints the following:

Elapsed: 17.62
Finished first fft
Elapsed: 38.41
Finished first fft
Filename: ffttest.py

Notice how the second fft, which has the smaller input size, takes more than twice as long to compute. I noticed that my computer was very slow (likely swapping) during the execution of this script.

Questions:

  • Is it correct that the fft can be calculated inplace, without the need for extra workspace? If so, why does not fftpack do that?

  • Is there a problem with fftpack here? Even if it needs extra workspace, why does it not reuse its workspace when the fft is rerun with different input sizes?

EDIT:

josteinb
  • 1,892
  • 2
  • 18
  • 32
  • You could alternatively use FFTW (https://pyfftw.github.io/pyFFTW/) - it's faster than fftpack and gives you more control over the memory. – Dietrich Mar 07 '17 at 23:16
  • @Dietrich Yes, I am aware of pyFFTW. My main issue with it is that I am on windows, and I did not find a conda package for win64. That means I would likely have to sort out lots of linking issues, which I am not in the mood for... Also, FFTW is under the GPL, which is unacceptable in some situations. – josteinb Mar 08 '17 at 08:22

1 Answers1

2

This is a known issue, and is caused by fftpack caching its strategy for computing the fft for a given size. That cache is about as large as the output of the computation, so if one does large ffts with different input sizes memory the memory consumption can become significant.

The problem is described in detail here:

https://github.com/scipy/scipy/issues/5986

Numpy has a similar problem, which is being worked on:

https://github.com/numpy/numpy/pull/7686

josteinb
  • 1,892
  • 2
  • 18
  • 32