Several users have asked about the speed or memory consumption of image convolutions in numpy or scipy [1, 2, 3, 4]. From the responses and my experience using Numpy, I believe this may be a major shortcoming of numpy compared to Matlab or IDL.
None of the answers so far have addressed the overall question, so here it is: "What is the fastest method for computing a 2D convolution in Python?" Common python modules are fair game: numpy, scipy, and PIL (others?). For the sake of a challenging comparison, I'd like to propose the following rules:
- Input matrices are 2048x2048 and 32x32, respectively.
- Single or double precision floating point are both acceptable.
- Time spent converting your input matrix to the appropriate format doesn't count -- just the convolution step.
- Replacing the input matrix with your output is acceptable (does any python library support that?)
- Direct DLL calls to common C libraries are alright -- lapack or scalapack
- PyCUDA is right out. It's not fair to use your custom GPU hardware.