If a function has been written to take advantage of the numpy
compiled code (or itself is compiled), it will be hard to improve on its performance with caching.
In [1]: from scipy.special import gammaln
In [2]: np_array = np.random.randint(5, size=(500))*1000+3
In [3]: np_array.shape
Out[3]: (500,)
In [4]: timeit gammaln(np_array).sum()
....
10000 loops, best of 3: 34.9 µs per loop
Evaluating each element will be much slower
In [5]: timeit sum([gammaln(i) for i in np_array])
1000 loops, best of 3: 1.8 ms per loop
Simply iterating through the array is slower.
In [6]: timeit sum([i for i in np_array])
10000 loops, best of 3: 127 µs per loop
In this case the values are a small set of integers
In [10]: np.unique(np_array)
Out[10]: array([ 3, 1003, 2003, 3003, 4003])
But identifying those unique values takes nearly as long as calculating the function for all 500.
In [11]: timeit np.unique(np_array)
...
In [12]: timeit gammaln(np.unique(np_array))
In fact it is slower if I also ask for indices that can reconstruct the array:
In [14]: timeit np.unique(np_array, return_inverse=True)
10000 loops, best of 3: 54 µs per loop
There are caching or memorizing recipes for Python, including a @functools.lru_cache
decorator. But I haven't seen them much with numpy
arrays. If your problem is suitable for cacheing I suspect it is best to do that in lower level code, such as with Cython, and using existing C or C++ cache libraries.
Cache with unique
We could get cache like behavior with unique
- use it to get both the unique values and the indices that lets us recreate the original array:
In [5]: unq, idx=np.unique(np_array, return_inverse=True)
In [6]: res1= gammaln(np_array)
In [7]: res2= gammaln(unq)
In [8]: res2= res2[idx]
In [9]: np.allclose(res1, res2)
Out[9]: True
But compare the times:
In [10]: timeit res1= gammaln(np_array)
10000 loops, best of 3: 29.1 µs per loop
In [11]: %%timeit
...: unq, idx=np.unique(np_array, return_inverse=True)
...: res2= gammaln(unq)[idx]
10000 loops, best of 3: 63.3 µs per loop
Most of this extra time is in calculating unique
; once we have the mapping, applying it is quite fast:
In [12]: timeit res2=gammaln(unq)[idx]
...
100000 loops, best of 3: 6.12 µs per loop
unique with sum
Returning the sum of values rather than actual values shifts relative times a bit
In [13]: %timeit res1= gammaln(np_array).sum()
10000 loops, best of 3: 35.3 µs per loop
In [14]: %%timeit
...: unq, idx=np.unique(np_array, return_inverse=True)
...: res2= gammaln(unq)[idx].sum()
10000 loops, best of 3: 71.3 µs per loop
With Paul Panzer's count approach
In [21]: %%timeit
...: unq, cnt=np.unique(np_array, return_counts=True)
...: res=(gammaln(unq)*cnt).sum()
10000 loops, best of 3: 59.5 µs per loop
In [22]: %%timeit
...: unq, cnt=np.unique(np_array, return_counts=True)
...: res=np.dot(gammaln(unq),cnt)
10000 loops, best of 3: 52.1 µs per loop