0

I want to get the mode and standard deviation of the sum of two random variables. For that, I create the discrete representation of the PDFs, and perform a convolution to get the new PDF, from which I calculate the mode and standard deviation.

I am using scipy.signal.fftconvolve, but as the operation is performed around 400K times, this is taking a lot of time.

I want to use pyFFTW library to speed things up, but could not make it work. From what I understand I need to:

  1. Calculate the FFT on both PDFs.
  2. Multiply the FFTs.
  3. Calculate the IFFT of the multiplication of FFTs.

To add insult to injury, neither of the distributions is centered around zero, nor is symetrical.

The code of a sample of two PDFs and current convolution can be seen here: https://gist.github.com/ajossorioarana/047202db9c2990b43cf7ba2e02735faf

The key section is the last one:

# Do the convolution with scipy.signal.fftconvolve
convolution_pdf = scipy.signal.fftconvolve(pdf1, pdf2, 'full')
convolution_pdf = convolution_pdf[0:len(pdf1)]

I tried using replacing the code from above with the following code:

## Do the convovlution with pyfftw.builders
builder1 = pyfftw.builders.fft(pdf1)
builder2 = pyfftw.builders.fft(pdf2)

fft1 = builder1()
fft2 = builder2()

builder3 = pyfftw.builders.ifft(fft1*fft2)

convolution = builder3()
ArthurOA
  • 23
  • 5
  • Why do you think `pyFFTW` will be faster than `fftconvolve`? And, by the way, your second line of code there is pointless -- you're making a copy of the whole array and overwriting the original. – Tim Roberts Jul 18 '23 at 19:17
  • Thanks! Edited error in the snippet. Second line is to keep convolution the same length as original pdfs. – ArthurOA Jul 18 '23 at 19:27
  • I think `pyFFTW` will be faster based on ths [SO question](https://stackoverflow.com/questions/6365623/improving-fft-performance-in-python) and [pyFFTW documentation](https://pyfftw.readthedocs.io/en/latest/) – ArthurOA Jul 18 '23 at 19:29
  • I'm going to register my dubiousness ;), but can you show us the pyFFTW solution you tried? – Tim Roberts Jul 19 '23 at 00:50
  • Why are you using float16s? There are no arithmetic instructions for float16s, so they'll all have to be converted to float32 and back. Why not use float32 to begin with? – Tim Roberts Jul 19 '23 at 01:12
  • Thanks! Forgot to add that alternative. Edited the answer with the code that uses `pyFFTW`. – ArthurOA Jul 19 '23 at 13:33
  • The idea behind using float16 was to shave some time and memory, if I did not more precision than that. Judging from your comment that was not a good idea. – ArthurOA Jul 19 '23 at 13:37
  • Using float16 gave me a little more performance – ArthurOA Jul 19 '23 at 13:57
  • I think those are giving you the same results. I'm not convinced that choosing the first 100,000 results from the 199,999 you get from scipy is correct, but I'm not sure. Further, the pyfftw result is complex. You probably want to take the absolute value. Once you do that, I think you have the same result. Certainly the later results are identical. – Tim Roberts Jul 19 '23 at 17:27

1 Answers1

1

To potentially speed things up, you can patch scipy.signal.fftconvolve to use the pyFFTW interface:

from timeit import Timer

import numpy
import pyfftw
import scipy.signal

# Planning first and enabling cache for optimization
shape = (1024, 512)
a = pyfftw.empty_aligned(shape, dtype="complex128")
b = pyfftw.empty_aligned(shape, dtype="complex128")
pyfftw.interfaces.cache.enable()

a[:] = numpy.random.randn(*shape) + 1j * numpy.random.randn(*shape)
b[:] = numpy.random.randn(*shape) + 1j * numpy.random.randn(*shape)

t = Timer(lambda: scipy.signal.fftconvolve(a, b))
print(f"Vanilla: {t.timeit(number=100):1.3f} seconds")

# Patching `fftpack` with `pyfftw.interfaces.scipy_fftpack`
scipy.fftpack = pyfftw.interfaces.scipy_fftpack
scipy.signal.fftconvolve(a, b)

print(f"Patched: {t.timeit(number=100):1.3f} seconds")

However I did not get noticeable improvements.

pyFFTW has the potential to enhance performance when dealing with extensive data arrays. However, when it comes to iterating a specific number of operations sequentially on reduced databases, the improvements prove to be rather minimal.