Fastest 2D convolution or image filter in Python

Question

Several users have asked about the speed or memory consumption of image convolutions in numpy or scipy [1, 2, 3, 4]. From the responses and my experience using Numpy, I believe this may be a major shortcoming of numpy compared to Matlab or IDL.

None of the answers so far have addressed the overall question, so here it is: "What is the fastest method for computing a 2D convolution in Python?" Common python modules are fair game: numpy, scipy, and PIL (others?). For the sake of a challenging comparison, I'd like to propose the following rules:

Input matrices are 2048x2048 and 32x32, respectively.
Single or double precision floating point are both acceptable.
Time spent converting your input matrix to the appropriate format doesn't count -- just the convolution step.
Replacing the input matrix with your output is acceptable (does any python library support that?)
Direct DLL calls to common C libraries are alright -- lapack or scalapack
PyCUDA is right out. It's not fair to use your custom GPU hardware.

"Replacing the input matrix with your output is acceptable (does any python library support that?)" For what it's worth, most numpy and scipy functions do... — Joe Kington, Apr 19 '11 at 02:23
I don't see any mention of that in the docs for convolve: http://docs.scipy.org/doc/numpy/reference/generated/numpy.convolve.html Am I missing something? — Carl F., Apr 19 '11 at 15:49
It's not supported for numpy's convolve, but it is for `scipy.ndimage.convolve`. http://www.scipy.org/SciPyPackages/Ndimage Also, most numpy functions (e.g. `sqrt`, `mul`, `add`) take an out parameter. You can do `np.sqrt(x, x)` to take the sqrt in-place. — Joe Kington, Apr 19 '11 at 16:45

Joe Kington · Accepted Answer · 2011-04-19T03:00:35.637

It really depends on what you want to do... A lot of the time, you don't need a fully generic (read: slower) 2D convolution... (i.e. If the filter is separable, you use two 1D convolutions instead... This is why the various scipy.ndimage.gaussian, scipy.ndimage.uniform, are much faster than the same thing implemented as a generic n-D convolutions.)

At any rate, as a point of comparison:

t = timeit.timeit(stmt='ndimage.convolve(x, y, output=x)', number=1,
setup="""
import numpy as np
from scipy import ndimage
x = np.random.random((2048, 2048)).astype(np.float32)
y = np.random.random((32, 32)).astype(np.float32)
""")
print t

This takes 6.9 sec on my machine...

Compare this with fftconvolve

t = timeit.timeit(stmt="signal.fftconvolve(x, y, mode='same')", number=1,
setup="""
import numpy as np
from scipy import signal
x = np.random.random((2048, 2048)).astype(np.float32)
y = np.random.random((32, 32)).astype(np.float32)
""")
print t

This takes about 10.8 secs. However, with different input sizes, using fft's to do a convolution can be considerably faster (Though I can't seem to come up with a good example, at the moment...).

Thanks Joe. This is a big improvement over the convolve function I had been using (I think it was just numpy.convolve). It consumed huge amounts of ram and ran slow (possibly as a result). I am hoping to get more participation, but maybe I'm too optimistic. — Carl F., Apr 20 '11 at 03:34
For those interested. I did this comparison (OS X 10.10 Macbook Air) 5 years later than the original post. `signal.fftconvolve` takes about **.9 seconds**! `ndimage.convolve` takes about **8 seconds**. Huge improvements have apparently been made on `signal.fftconvolve` under the hood. — nmante, Mar 11 '16 at 09:15

Sven Marnach · Answer 2 · 2011-04-20T12:49:09.920

12

On my machine, a hand-crafted circular convolution using FFTs seems to be fasted:

import numpy
x = numpy.random.random((2048, 2048)).astype(numpy.float32)
y = numpy.random.random((32, 32)).astype(numpy.float32)
z = numpy.fft.irfft2(numpy.fft.rfft2(x) * numpy.fft.rfft2(y, x.shape))

Note that this might treat the areas close to the edges differently than other ways, because it's a circular convolution.

edited Apr 20 '11 at 12:49

answered Apr 20 '11 at 11:40

Sven Marnach

574,206
118
941
841

Infinitely faster – Blasco Jun 06 '19 at 15:46
How would you apply the same concept to "same" convolution? – Camilo Martinez M. Apr 15 '21 at 04:47

score 4 · Answer 3 · answered May 19 '11 at 13:03

I did some experiments with this too. My guess is that the SciPy convolution does not use the BLAS library to accelerate the computation. Using BLAS, I was able to code a 2D convolution that was comparable in speed to MATLAB's. It's more work, but your best bet is to recode the convolution in C++.

Here is the tight part of the loop (please forgive the weird () based array referencing, it is my convenience class for MATLAB arrays) The key part is that you don't iterate over the image, you iterate over the filter and let BLAS iterate over the image, because typically the image is much larger than the filter.

for(int n = 0; n < filt.numCols; n++)
  {
    for(int m = 0; m < filt.numRows; m++)
    {
      const double filt_val = filt(filt.numRows-1-m,filt.numCols-1-n);
      for (int i =0; i < diffN; i++)
      {
        double *out_ptr = &outImage(0,i);
        const double *im_ptr = &image(m,i+n);
        cblas_daxpy(diffM,filt_val,im_ptr, 1, out_ptr,1);

      }
   }
 }

score 1 · Answer 4 · answered Jun 07 '18 at 21:12

I have been trying to improve the convolution speed in my application and I have been using signal.correlate which happens to be about 20 times slower than signal.correlate2d, my input matrices are smaller(27x27 and 5x5). As of 2018, this is what I observed on my machine(Dell Inspiron 13, Core i5) for the specified matrices in the actual question.

OpenCV did the best but the caveat with that is that it doesn't given "mode" options. Input and Output are of the same size.

>>> img= np.random.rand(2048,2048)
>>> kernel = np.ones((32,32), dtype=np.float)
>>> t1= time.time();dst1 = cv2.filter2D(img,-1,kernel);print(time.time()-t1)
0.208490133286
>>> t1= time.time();dst2 = signal.correlate(img,kernel,mode='valid',method='fft');print(time.time()-t1)
0.582989931107
>>> t1= time.time();dst3 = signal.convolve2d(img,kernel,mode='valid');print(time.time()-t1)
11.2672450542
>>> t1= time.time();dst4 = signal.correlate2d(img,kernel,mode='valid');print(time.time()-t1)
11.2443971634
>>> t1= time.time();dst5 = signal.fftconvolve(img,kernel,mode='valid');print(time.time()-t1)
0.581533193588

score 0 · Answer 5 · answered Nov 25 '18 at 09:49

Scipy has a function fftconvolve, that can be used for 1D and 2D signals.

from scipy import signal
from scipy import misc
import numpy as np
import matplotlib.pyplot as plt

face = misc.face(gray=True)
kernel = np.outer(signal.gaussian(70, 8), signal.gaussian(70, 8))
blurred = signal.fftconvolve(face, kernel, mode='same')

fig, (ax_orig, ax_kernel, ax_blurred) = plt.subplots(3, 1, figsize=(6, 15))
ax_orig.imshow(face, cmap='gray')
ax_orig.set_title('Original')
ax_orig.set_axis_off()
ax_kernel.imshow(kernel, cmap='gray')
ax_kernel.set_title('Gaussian kernel')
ax_kernel.set_axis_off()
ax_blurred.imshow(blurred, cmap='gray')
ax_blurred.set_title('Blurred')
ax_blurred.set_axis_off()
fig.show()

Fastest 2D convolution or image filter in Python

5 Answers5

Linked