0

I use np.einsum to calculate the flow of material in a graph (1 node to 4 nodes in this example). The amount of flow is given by amount (amount.shape == (1, 1, 2) the dimensions define certain criteria, let's call them a, b, c).

The boolean matrix route determines the permissible flow based on the a, b, c criteria into y (route.shape == (4, 1, 1, 2); yabc). I label the dimensions y, a, b, c. abc are equivalent to amounts dimensions abc, y is the direction of the flow (0, 1, 2 or 3). To determine the amount of material in y, I calculate np.einsum('abc,yabc->y', amount, route) and get a y-dim vector with the flows into y. There's also an implicit priorisation of the route. For instance, any route[0, ...] == True is False for any y=1..3, any route[1, ...] == True is False for the next higher y-dim routes and so on. route[3, ...] (last y-index) defines the catch-all route, that is, its values are True when previous y-index values were False ((route[0] ^ route[1] ^ route[2] ^ route[3]).all() == True).

This works fine. However, when I introduce another criteria (dimension) x which only exists in route, but not in amount, this logic seems to break. The below code demonstrates the problem:

>>> import numpy as np

>>> amount = np.asarray([[[5000.0, 0.0]]]) 

>>> route = np.asarray([[[[[False, True]]], [[[False, True]]], [[[False, True]]]], [[[[True, False]]], [[[False, False]]], [[[False, False]]]], [[[[False, False]]], [[[True, False]]], [[[False, False]]]], [[[[False, False]]], [[[False, False]]], [[[True, False]]]]], dtype=bool)

>>> amount.shape
(1, 1, 2)

>>> Added dimension `x`
>>> # y,x,a,b,c
>>> route.shape
(4, 3, 1, 1, 2)

>>> # Attempt 1: `5000` can flow into y=1, 2 or 3. I expect
>>> # `flows1.sum() == amount.sum()` as it would be without `x`.
>>> # Correct solution would be `[0, 5000, 0, 0]` because material is routed
>>> # to y=1, and is not available for y=2 and y=3 as they are lower
>>> # priority (higher index)
>>> flows1 = np.einsum('abc,yxabc->y', amount, route)
>>> flows1
array([   0., 5000., 5000., 5000.])

>>> # Attempt 2: try to collapse `x` => not much different, duplication
>>> np.einsum('abc,yabc->y', amount, route.any(1))
array([   0., 5000., 5000., 5000.])

>>> # This is the flow by `y` and `x`. I'd only expect a `5000` in the
>>> # 2nd row (`[5000.,    0.,    0.]`) not the others.
>>> np.einsum('abc,yxabc->yx', amount, route)
array([[   0.,    0.,    0.],
       [5000.,    0.,    0.],
       [   0., 5000.,    0.],
       [   0.,    0., 5000.]])

Is there any feasible operation which I can apply to route (.all(1) doesn't work either) to ignore the x-dimension?

Another example:

>>> amount2 = np.asarray([[[5000.0, 1000.0]]])
>>> np.einsum('abc,yabc->y', amount2, route.any(1))
array([1000., 5000., 5000., 5000.])

can be interpreted as 1000.0 being routed to y=0 (and none of the other y-destinations) and 5000.0 being compatible with destination y=1, y=2 and y=3, but ideally, I'd only like to show 5000.0 up in y=1 (as that's the lowest index and highest destination priority).

Solution attempt

The below works, but is not very numpy-ish. It'll be great if the loop could be eliminated.

# Initialise destination
result = np.zeros((route.shape[0]))
# Calculate flow by maintaining all dimensions (this will cause
# double ups because `x` is not part of `amount2`
temp = np.einsum('abc,yxabc->yxabc', amount2, route)
temp_ixs = np.asarray(np.where(temp))

# For each original amount, find the destination (`y`)
for a, b, c in zip(*np.where(amount2)):
    # Find where dimensions `abc` are equal in the destination.
    # Take the first vector which contains `yxabc` (we get `yx` as result)
    ix = np.where((temp_ixs[2:].T == [a, b, c]).all(axis=1))[0][0]
    y_ix = temp_ixs.T[ix][0]
    # ignored
    x_ix = temp_ixs.T[ix][1]
    v = amount2[a, b, c]
    # build resulting destination
    result[y_ix] += v

# result == array([1000., 5000.,    0.,    0.])

With other words for each value in amount2, I am looking for the lowest indices yx in temp so that the value can be written to result[y] = value (x is ignored).

>>> temp = np.einsum('abc,yxabc->yx', amount2, route)
>>> temp
        #  +--- value=1000 at y=0 => result[0] += 1000
        # /
array([[1000., 1000., 1000.],
        #  +--- value=5000 at y=1 => result[1] += 5000
        # /
       [5000.,    0.,    0.],
       [   0., 5000.,    0.],
       [   0.,    0., 5000.]])
>>> result
array([1000., 5000.,    0.,    0.])
>>> amount2
array([[[5000., 1000.]]])

Another attempt to reduce the dimensionality of route is:

>>> r = route.any(1)
>>> for x  in xrange(1, route.shape[0]):
        r[x] = r[x] & (r[:x] == False).all(axis=0)

>>> np.einsum('abc,yabc->y', amount2, r)
array([1000., 5000.,    0.,    0.])

This essentially preserves above-mentioned priority given by the first dimension of route. Any lower priority (higher index) array cannot contain a True value when a higher priority array has a value of True already at that sub index. While this is a lot better than my explicit approach, it would be great if the for x in xrange... loop could be expressed as numpy vector operation.

orange
  • 7,755
  • 14
  • 75
  • 139
  • First case sums the columns of ythe last. The `any` in the 2 d has the same effect. Looks like 'ignore' means take the first column of the 2d result, `[:,0]` – hpaulj Jun 05 '19 at 05:07
  • Another clumsy solution would be to calculate flow = `abc,yabc->yabc` (keep all dimensions) and check for the first occurrence of the indices `abc` of `np.where(amount)` in `np.asarray(np.where(flow))[1:] == abc_indices` to obtain the first `y` where this value is present first in `flow`. However, casting this in a numpy operation might be challenging/impossible... – orange Jun 05 '19 at 05:36

1 Answers1

0

I haven't tried to follow your 'flow' interpretation of the multiplication problem. I'm just focusing on the calculation options.

Stripped of unnecessary dimensions, your arrays are:

In [194]: amount                                                                                       
Out[194]: array([5000.,    0.])
In [195]: route                                                                                        
Out[195]: 
array([[[0, 1],
        [0, 1],
        [0, 1]],

       [[1, 0],
        [0, 0],
        [0, 0]],

       [[0, 0],
        [1, 0],
        [0, 0]],

       [[0, 0],
        [0, 0],
        [1, 0]]])

And the yx calculation is:

In [197]: np.einsum('a,yxa->yx',amount, route)                                                         
Out[197]: 
array([[   0.,    0.,    0.],
       [5000.,    0.,    0.],
       [   0., 5000.,    0.],
       [   0.,    0., 5000.]])

which is just this slice of route times 5000.

In [198]: route[:,:,0]                                                                                 
Out[198]: 
array([[0, 0, 0],
       [1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

Omit the x on the RHS of the einsum results in summation across the dimension.

Equivalently we can multiply (with broadcasting):

In [200]: (amount*route).sum(axis=2)                                                                   
Out[200]: 
array([[   0.,    0.,    0.],
       [5000.,    0.,    0.],
       [   0., 5000.,    0.],
       [   0.,    0., 5000.]])
In [201]: (amount*route).sum(axis=(1,2))                                                               
Out[201]: array([   0., 5000., 5000., 5000.])

Maybe looking at amount*route will help visualize the problem. You can also use max, min, argmax etc instead of sum, or along with it on one or more of the axes.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks. Please have a look at my **Solution attempt** section. Seeing an actual algorithm might provide a better explanation. All dimensions (abc) even though they are 1 are required as this is only an example and in reality the dimensions are larger. – orange Jun 06 '19 at 03:01
  • I boiled down the problem to vectorising the operation of `for x in xrange(1, route.shape[0]): r[x] = r[x] & (r[:x] == False).all(axis=0)` in numpy. Any idea? – orange Jun 06 '19 at 04:29