Ignore dimension when using np.einsum

Question

I use np.einsum to calculate the flow of material in a graph (1 node to 4 nodes in this example). The amount of flow is given by amount (amount.shape == (1, 1, 2) the dimensions define certain criteria, let's call them a, b, c).

The boolean matrix route determines the permissible flow based on the a, b, c criteria into y (route.shape == (4, 1, 1, 2); yabc). I label the dimensions y, a, b, c. abc are equivalent to amounts dimensions abc, y is the direction of the flow (0, 1, 2 or 3). To determine the amount of material in y, I calculate np.einsum('abc,yabc->y', amount, route) and get a y-dim vector with the flows into y. There's also an implicit priorisation of the route. For instance, any route[0, ...] == True is False for any y=1..3, any route[1, ...] == True is False for the next higher y-dim routes and so on. route[3, ...] (last y-index) defines the catch-all route, that is, its values are True when previous y-index values were False ((route[0] ^ route[1] ^ route[2] ^ route[3]).all() == True).

This works fine. However, when I introduce another criteria (dimension) x which only exists in route, but not in amount, this logic seems to break. The below code demonstrates the problem:

>>> import numpy as np

>>> amount = np.asarray([[[5000.0, 0.0]]]) 

>>> route = np.asarray([[[[[False, True]]], [[[False, True]]], [[[False, True]]]], [[[[True, False]]], [[[False, False]]], [[[False, False]]]], [[[[False, False]]], [[[True, False]]], [[[False, False]]]], [[[[False, False]]], [[[False, False]]], [[[True, False]]]]], dtype=bool)

>>> amount.shape
(1, 1, 2)

>>> Added dimension `x`
>>> # y,x,a,b,c
>>> route.shape
(4, 3, 1, 1, 2)

>>> # Attempt 1: `5000` can flow into y=1, 2 or 3. I expect
>>> # `flows1.sum() == amount.sum()` as it would be without `x`.
>>> # Correct solution would be `[0, 5000, 0, 0]` because material is routed
>>> # to y=1, and is not available for y=2 and y=3 as they are lower
>>> # priority (higher index)
>>> flows1 = np.einsum('abc,yxabc->y', amount, route)
>>> flows1
array([   0., 5000., 5000., 5000.])

>>> # Attempt 2: try to collapse `x` => not much different, duplication
>>> np.einsum('abc,yabc->y', amount, route.any(1))
array([   0., 5000., 5000., 5000.])

>>> # This is the flow by `y` and `x`. I'd only expect a `5000` in the
>>> # 2nd row (`[5000.,    0.,    0.]`) not the others.
>>> np.einsum('abc,yxabc->yx', amount, route)
array([[   0.,    0.,    0.],
       [5000.,    0.,    0.],
       [   0., 5000.,    0.],
       [   0.,    0., 5000.]])

Is there any feasible operation which I can apply to route (.all(1) doesn't work either) to ignore the x-dimension?

Another example:

>>> amount2 = np.asarray([[[5000.0, 1000.0]]])
>>> np.einsum('abc,yabc->y', amount2, route.any(1))
array([1000., 5000., 5000., 5000.])

can be interpreted as 1000.0 being routed to y=0 (and none of the other y-destinations) and 5000.0 being compatible with destination y=1, y=2 and y=3, but ideally, I'd only like to show 5000.0 up in y=1 (as that's the lowest index and highest destination priority).

Solution attempt

The below works, but is not very numpy-ish. It'll be great if the loop could be eliminated.

# Initialise destination
result = np.zeros((route.shape[0]))
# Calculate flow by maintaining all dimensions (this will cause
# double ups because `x` is not part of `amount2`
temp = np.einsum('abc,yxabc->yxabc', amount2, route)
temp_ixs = np.asarray(np.where(temp))

# For each original amount, find the destination (`y`)
for a, b, c in zip(*np.where(amount2)):
    # Find where dimensions `abc` are equal in the destination.
    # Take the first vector which contains `yxabc` (we get `yx` as result)
    ix = np.where((temp_ixs[2:].T == [a, b, c]).all(axis=1))[0][0]
    y_ix = temp_ixs.T[ix][0]
    # ignored
    x_ix = temp_ixs.T[ix][1]
    v = amount2[a, b, c]
    # build resulting destination
    result[y_ix] += v

# result == array([1000., 5000.,    0.,    0.])

With other words for each value in amount2, I am looking for the lowest indices yx in temp so that the value can be written to result[y] = value (x is ignored).

>>> temp = np.einsum('abc,yxabc->yx', amount2, route)
>>> temp
        #  +--- value=1000 at y=0 => result[0] += 1000
        # /
array([[1000., 1000., 1000.],
        #  +--- value=5000 at y=1 => result[1] += 5000
        # /
       [5000.,    0.,    0.],
       [   0., 5000.,    0.],
       [   0.,    0., 5000.]])
>>> result
array([1000., 5000.,    0.,    0.])
>>> amount2
array([[[5000., 1000.]]])

Another attempt to reduce the dimensionality of route is:

>>> r = route.any(1)
>>> for x  in xrange(1, route.shape[0]):
        r[x] = r[x] & (r[:x] == False).all(axis=0)

>>> np.einsum('abc,yabc->y', amount2, r)
array([1000., 5000.,    0.,    0.])

This essentially preserves above-mentioned priority given by the first dimension of route. Any lower priority (higher index) array cannot contain a True value when a higher priority array has a value of True already at that sub index. While this is a lot better than my explicit approach, it would be great if the for x in xrange... loop could be expressed as numpy vector operation.

First case sums the columns of ythe last. The `any` in the 2 d has the same effect. Looks like 'ignore' means take the first column of the 2d result, `[:,0]` — hpaulj, Jun 05 '19 at 05:07
Another clumsy solution would be to calculate flow = `abc,yabc->yabc` (keep all dimensions) and check for the first occurrence of the indices `abc` of `np.where(amount)` in `np.asarray(np.where(flow))[1:] == abc_indices` to obtain the first `y` where this value is present first in `flow`. However, casting this in a numpy operation might be challenging/impossible... — orange, Jun 05 '19 at 05:36

score 0 · Answer 1 · answered Jun 05 '19 at 17:25

I haven't tried to follow your 'flow' interpretation of the multiplication problem. I'm just focusing on the calculation options.

Stripped of unnecessary dimensions, your arrays are:

In [194]: amount                                                                                       
Out[194]: array([5000.,    0.])
In [195]: route                                                                                        
Out[195]: 
array([[[0, 1],
        [0, 1],
        [0, 1]],

       [[1, 0],
        [0, 0],
        [0, 0]],

       [[0, 0],
        [1, 0],
        [0, 0]],

       [[0, 0],
        [0, 0],
        [1, 0]]])

And the yx calculation is:

In [197]: np.einsum('a,yxa->yx',amount, route)                                                         
Out[197]: 
array([[   0.,    0.,    0.],
       [5000.,    0.,    0.],
       [   0., 5000.,    0.],
       [   0.,    0., 5000.]])

which is just this slice of route times 5000.

In [198]: route[:,:,0]                                                                                 
Out[198]: 
array([[0, 0, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

Omit the x on the RHS of the einsum results in summation across the dimension.

Equivalently we can multiply (with broadcasting):

In [200]: (amount*route).sum(axis=2)                                                                   
Out[200]: 
array([[   0.,    0.,    0.],
       [5000.,    0.,    0.],
       [   0., 5000.,    0.],
       [   0.,    0., 5000.]])
In [201]: (amount*route).sum(axis=(1,2))                                                               
Out[201]: array([   0., 5000., 5000., 5000.])

Maybe looking at amount*route will help visualize the problem. You can also use max, min, argmax etc instead of sum, or along with it on one or more of the axes.

Thanks. Please have a look at my **Solution attempt** section. Seeing an actual algorithm might provide a better explanation. All dimensions (abc) even though they are 1 are required as this is only an example and in reality the dimensions are larger. — orange, Jun 06 '19 at 03:01
I boiled down the problem to vectorising the operation of `for x in xrange(1, route.shape[0]): r[x] = r[x] & (r[:x] == False).all(axis=0)` in numpy. Any idea? — orange, Jun 06 '19 at 04:29

Ignore dimension when using np.einsum

1 Answers1