4

Problem

  • I have an ndarray, defined by arr that is an n-dimensional cube with length m in each dimension.

  • I want to act a function, func, by slicing along the dimension n=0 and taking each n-1-dim slice as an input to the function.

This seems to work for map() but I can't find a numpy variant that is appropriate. np.vectorise seems to split the n-1-tensor into individual scalar entries. Neither apply_along_axis or apply_over_axes seem appropriate either.

My problem is such that I need to pass arbitrary functions as inputs so I do not see a solution with einsum being feasible either.

Question

  • Do you know the best numpy alternative to using np.asarray(map(func, arr))?

Example

I define an example array, arr as a 4-dim cube (or 4-tensor) by:

m, n = 3, 4 
arr = np.arange(m**n).reshape((m,)*n)

I define an example function f,

def f(x):
    """makes it obvious how the np.ndarray is being passed into the function"""
    try: # perform an op using x[0,0,0] which is expected to exist
        i = x[0,0,0]
    except:
        print '\nno element x[0,0,0] in x: \n{}'.format(x)
        return np.nan
    return x-x+i

The expected result, res, from this function would remain the same shape but would satisfy the following:

print all([(res[i] == i*m**(n-1)).all() for i in range(m)])

This works with the default map() function,

res = np.asarray(map(f, a))
print all([(res[i] == i*m**(n-1)).all() for i in range(m)])
True

I would expect np.vectorize to work in the same way as map() but it acts in scalar entries:

res = np.vectorize(f)(a)

no element x[0,0,0] in x: 
0
...
Community
  • 1
  • 1
Alexander McFarlane
  • 10,643
  • 9
  • 59
  • 100

1 Answers1

2

Given that arr is 4d, and your fn works on 3d arrays,

np.asarray(map(func, arr))   # list(map(...)) for Py3

looks perfectly reasonable. I'd use the list comprehension form, but that's a matter of programming style

np.asarray([func(i) for i in arr])

for i in arr iterates on the first dimension of arr. In effect it treats arr as a list of the 3d arrays. And then it reassembles the resulting list into a 4d array.

np.vectorize doc could be more explicit about the function taking scalars. But yes, it passes values as scalars. Note that np.vectorize does not have provision for passing an iteration axis parameter. It's most useful when your function takes values from several array, something like

 [func(a,b) for a,b in zip(arrA, arrB)]

It generalizes the zip so allow for broadcasting. But otherwise it is an iterative solution. It knows nothing about the guts of your func, so it can't speed up its calls.

np.vectorize ends up calling np.frompyfunc, which being a bit less general is a bit faster. But it too passes scalars to the func.

np.apply_along/over_ax(e/i)s also iterate over one or more axes. You may find their code instructive, but I agree they don't apply here.

A variation on the map approach is to allocate the result array, and index:

In [45]: res=np.zeros_like(arr,int)
In [46]: for i in range(arr.shape[0]):
    ...:     res[i,...] = f(arr[i,...])
   

This may be easier if you need to iterate on a different axis than the 1st.

You need to do your own timings to see which is faster.

========================

An example of iteration over the 1st dimension with in-place modification:

In [58]: arr.__array_interface__['data']  # data buffer address
Out[58]: (152720784, False)

In [59]: for i,a in enumerate(arr):
    ...:     print(a.__array_interface__['data'])
    ...:     a[0,0,:]=i
    ...:     
(152720784, False)   # address of the views (same buffer)
(152720892, False)
(152721000, False)

In [60]: arr
Out[60]: 
array([[[[ 0,  0,  0],
         [ 3,  4,  5],
         [ 6,  7,  8]],

        ...

       [[[ 1,  1,  1],
         [30, 31, 32],
         ...

       [[[ 2,  2,  2],
         [57, 58, 59],
         [60, 61, 62]],
       ...]]])

When I iterate over an array, I get a view that starts at successive points on the common data buffer. If I modify the view, as above or even with a[:]=..., I modify the original. I don't have to write anything back. But don't use a = ...., which breaks the link to the original array.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • the problem with `map()` and generator expressions is that the reallocate memory locations for the `ndarray` which is wasteful for large arrays - I was hoping there was a `numpy` function that would just use the existing memory pointers to the current array locations – Alexander McFarlane Jul 12 '16 at 19:13
  • `arr[i,...] = f(arr[i,...])` will put the subarrays back in the original array. Whether temporary arrays, copies or views are created depends on the details of what `f` is doing. But I wonder how well you understand the use of `memory pointers` in numpy arrays. – hpaulj Jul 12 '16 at 19:17
  • Thanks for that hint! I would say I understand better than most but probably average-low among python experts - [this book](http://web.mit.edu/dvp/Public/numpybook.pdf#153) got me most of the way, particularly section 2.3 and 3.1.1 – Alexander McFarlane Jul 12 '16 at 20:25
  • 1
    I'll try to write an example of `f` that modifies its input `x` in-place and doesn't the last assignment – hpaulj Jul 12 '16 at 20:39
  • thanks - I'm manipulating lattices of sizes up to `48**4` or possibly even `48**7` but the latter I will probably use `C` or `Fortran` instead so memory usage is worthwhile thinking about even if it's just a comment in my code as a reminder – Alexander McFarlane Jul 12 '16 at 20:48
  • I added a simple inplace modification of the array. – hpaulj Jul 13 '16 at 00:28