5

In numpy (1.8), I want to move this computation out of a Python loop into something more numpy-ish for better performance:

(width, height) = base.shape
(toolw, toolh) = tool.shape
for i in range(0, width-toolw):
    for j in range(0, height-toolh):
        zdiff[i,j] = (tool - base[i:i+toolw, j:j+toolh]).min()

base is a ~2000x2000 array, and tool is a 25x25 array. (Background context: base and tool are height maps, and I'm trying to figure out the closest approach for tool moved all over base.)

I'm trying to use a striding trick, starting with this:

base_view = np.lib.stride_tricks.as_strided(base, shape=(2000, 2000, 25, 25), 
                                            strides=(base.strides * 2))

This would make base_view[10,20] be a 25x25 array of values from base at the upper left corner of (10, 20).

However, this is failing with "Array too big". From value testing, it looks like it reports this problem when the potential size of the array (e.g. 2000*2000*25*25*8) exceeds 2^32-ish and it's triggering an overflow check that multiplies out all the dimensions. (I am on a 32-bit Python installation).

I feel like I'm missing something -- why won't it let me create this "stride view" when the stride values clearly work? Is there a way to force this?

And more generally, is there a way to optimize my loop above?

Updated: exact error:

ValueError                                Traceback (most recent call last)
<ipython-input-14-313b3d6c74fa> in <module>()
----> 1 newa = np.lib.stride_tricks.as_strided(base, shape=(1000, 1000, 25, 25), strides=(base.strides * 2))

C:\Python27\lib\site-packages\numpy\lib\stride_tricks.pyc in as_strided(x, shape, strides)
     28     if strides is not None:
     29         interface['strides'] = tuple(strides)
---> 30     array = np.asarray(DummyArray(interface, base=x))
     31     # Make sure dtype is correct in case of custom dtype
     32     array.dtype = x.dtype

C:\Python27\lib\site-packages\numpy\core\numeric.pyc in asarray(a, dtype, order)
    458 
    459     """
--> 460     return array(a, dtype, copy=False, order=order)
    461 
    462 def asanyarray(a, dtype=None, order=None):

ValueError: array is too big.
payne
  • 13,833
  • 5
  • 42
  • 49
  • 1
    What's the exact error? On a side note, this is a good candidate for `scipy.ndimage.generic_filter`, though that's likely to be slower than your strided approach. – Joe Kington Nov 14 '13 at 21:26
  • 1
    I looked at the scipy.ndimage generic filter. The filter is a Python function which doesn't move the loop into numpy, which is where (I suspect) the real speedup will be. – payne Nov 14 '13 at 21:45
  • 1
    @alko: it was that thread that sent me down the path. But the approach breaks if the dimensions multiply out too big. – payne Nov 14 '13 at 21:46
  • On a side note, you're using the wrong strides and shape to do the "rolling window" trick. If you have a 2000x2000 array and a 25x25 window, the shape should be `(1976, 1976, 25, 25)`, not `(2000, 2000, 25, 25)`. – Joe Kington Nov 14 '13 at 22:10
  • 1
    The error you are seeing is being raised [here](https://github.com/numpy/numpy/blob/4050ac73af79ae8cc513648ff02e9a22041501c4/numpy/core/src/multiarray/ctors.c#L964). It is computing the total size of the array, which overflows a `np.intp` which on a 32 bit system is an int32. It does seem like the virtual size of your array shouldn't be an issue. It seems you will only need to replace the current check with one based on the sum of the product of the strides with their corresponding dimensions. All your problems would be gone if you switched to a 64bit version of Numpy... – Jaime Nov 14 '13 at 22:10
  • 1
    @JoeKington No, `as_strided` doesn't check for that, you can mess things up big time if you do the math wrong and start writing or reading beyond the actual array boundaries. – Jaime Nov 14 '13 at 22:11
  • @Jamie - Yep, you're right (I edited my comment just after posting it). His current code will segfault if you try to do anything with `base_view`, even on 64-bit. – Joe Kington Nov 14 '13 at 22:13
  • @jaime I found the same check. It appears to ignore the stride values, and just assumes that the array is "regularly strided". In cases like this where you're fiddling with strides, it appears to be overly restrictive. – payne Nov 14 '13 at 22:22
  • DO you need the whole `zdiff` array, or are you only trying to figure where the minimum (or maximum) of `zdiff` is? That could be calculated much, much faster. – Jaime Nov 14 '13 at 23:44
  • 1
    If your total array elements are larger then np.intp, you would get so many problems elsewhere. The check is there for a reason. The memory size is a seperate issue from that. – seberg Nov 15 '13 at 00:13
  • @seberg In this case, why would that be? The array location is computed by multiplying the index value for each axis by the stride for each axis, and the summing. The location value, intermediate or final, never gets close to intmax. – payne Nov 15 '13 at 00:57
  • Because you can't even iterate through all elements for example while tracking a single flat index (to keep book of how much to go). Sure a lot might still work, but a lot will probably stop, and I wouldn't be surprised if you would create huge bugs everywhere if you were to remove it. – seberg Nov 15 '13 at 01:04
  • Issue submitted: https://github.com/numpy/numpy/issues/4069 – payne Nov 23 '13 at 12:28

1 Answers1

3

I can't really help you with the strides approach, but do have a method that should be faster than your original code. It loops over the tool array instead of over the base array, meaning, however not fully vectorized, a lot more work is pushed to numpy.

Note that in your original code I changed the ranges and switched the widths and heights, because I assume that is what you intended..

import numpy as np

height, width = 500, 500
toolh, toolw = 6, 6

base = np.random.rand(height, width)
tool = np.random.rand(toolh, toolw)

m, n = height-toolh+1, width-toolw+1

def height_diff_old(base, tool):
    zdiff = np.empty((m, n))
    for i in range(m):
        for j in range(n):
            zdiff[i, j] = (tool - base[i:i+toolh, j:j+toolw]).min()
    return zdiff

def height_diff_new(base, tool):
    zdiff = np.empty((m, n))
    zdiff.fill(np.inf)
    for i in range(toolh):
        for j in range(toolw):
            diff_ij = tool[i, j] - base[i:i+m, j:j+n]
            np.minimum(zdiff, diff_ij, out=zdiff)
    return zdiff

Of course you'd want to calculate the heights and widths in your actual function, but for testing it was easier having them as globals.

For the given array sizes the original code runs in 7.38 seconds while the new code takes only 206 milliseconds on my system. I assume the new code is faster for your array sizes as well but I'm not sure how well it scales :)

Other alternatives that may or may not be of interest for you are using Numba or Cython, which in many cases should be faster than any "vectorized" numpy code you think of..