NumPy uses buffers for ufuncs:
Apparently this buffer size parameter can have a significant effect on speed, e.g.,
In [1]: x = np.ones(int(1e7))
In [2]: timeit x.sum()
100 loops, best of 3: 4.88 ms per loop
In [3]: np.getbufsize()
Out[3]: 8192
In [4]: np.setbufsize(64)
Out[4]: 8192
In [5]: timeit x.sum()
10 loops, best of 3: 18 ms per loop
Furthermore, if I, for example, write a Fortran function:
function compute_sum(n, x) result(s)
!f2py integer, intent(hide), depend(x) :: n = shape(x,0)
!f2py double precision, intent(in) :: x(n)
!f2py double precision, intent(out) :: s
integer n
double precision x(n)
double precision s
do i=1,n
s = s + x(i)
end do
end function
and compile with f2py
:
f2py -c mysum.f90 -m mysum
then the Fortran code is twice as slow as the NumPy sum
:
In [1]: import mysum
In [2]: x = np.ones(int(1e7))
In [3]: timeit mysum.compute_sum(x)
100 loops, best of 3: 10.2 ms per loop
I imagine there's a little overhead (although, the array is not copied in) in passing back and forth between the Fortran code, but certainly not 5 milliseconds worth!
Two questions:
- What exactly does the NumPy documentation mean by "buffers are used for misaligned data, etc ..." and how do this affect how
sum
is calculated? - What would the analogous (i.e., running nearly as fast the NumPy
sum
) Fortran code look like for a given buffer size?