I use np.einsum
to calculate the flow of material in a graph (1 node to 4 nodes in this example). The amount of flow is given by amount
(amount.shape == (1, 1, 2)
the dimensions define certain criteria, let's call them a
, b
, c
).
The boolean matrix route
determines the permissible flow based on the a
, b
, c
criteria into y
(route.shape == (4, 1, 1, 2)
; yabc
). I label the dimensions y
, a
, b
, c
. abc
are equivalent to amount
s dimensions abc
, y
is the direction of the flow (0, 1, 2 or 3). To determine the amount of material in y
, I calculate np.einsum('abc,yabc->y', amount, route)
and get a y-dim vector with the flows into y
. There's also an implicit priorisation of the route. For instance, any route[0, ...] == True
is False
for any y=1..3
, any route[1, ...] == True
is False
for the next higher y-dim routes and so on. route[3, ...]
(last y-index) defines the catch-all route, that is, its values are True
when previous y-index values were False ((route[0] ^ route[1] ^ route[2] ^ route[3]).all() == True
).
This works fine. However, when I introduce another criteria (dimension) x
which only exists in route
, but not in amount
, this logic seems to break. The below code demonstrates the problem:
>>> import numpy as np
>>> amount = np.asarray([[[5000.0, 0.0]]])
>>> route = np.asarray([[[[[False, True]]], [[[False, True]]], [[[False, True]]]], [[[[True, False]]], [[[False, False]]], [[[False, False]]]], [[[[False, False]]], [[[True, False]]], [[[False, False]]]], [[[[False, False]]], [[[False, False]]], [[[True, False]]]]], dtype=bool)
>>> amount.shape
(1, 1, 2)
>>> Added dimension `x`
>>> # y,x,a,b,c
>>> route.shape
(4, 3, 1, 1, 2)
>>> # Attempt 1: `5000` can flow into y=1, 2 or 3. I expect
>>> # `flows1.sum() == amount.sum()` as it would be without `x`.
>>> # Correct solution would be `[0, 5000, 0, 0]` because material is routed
>>> # to y=1, and is not available for y=2 and y=3 as they are lower
>>> # priority (higher index)
>>> flows1 = np.einsum('abc,yxabc->y', amount, route)
>>> flows1
array([ 0., 5000., 5000., 5000.])
>>> # Attempt 2: try to collapse `x` => not much different, duplication
>>> np.einsum('abc,yabc->y', amount, route.any(1))
array([ 0., 5000., 5000., 5000.])
>>> # This is the flow by `y` and `x`. I'd only expect a `5000` in the
>>> # 2nd row (`[5000., 0., 0.]`) not the others.
>>> np.einsum('abc,yxabc->yx', amount, route)
array([[ 0., 0., 0.],
[5000., 0., 0.],
[ 0., 5000., 0.],
[ 0., 0., 5000.]])
Is there any feasible operation which I can apply to route
(.all(1)
doesn't work either) to ignore the x-dimension?
Another example:
>>> amount2 = np.asarray([[[5000.0, 1000.0]]])
>>> np.einsum('abc,yabc->y', amount2, route.any(1))
array([1000., 5000., 5000., 5000.])
can be interpreted as 1000.0
being routed to y=0
(and none of the other y-destinations) and 5000.0
being compatible with destination y=1
, y=2
and y=3
, but ideally, I'd only like to show 5000.0
up in y=1
(as that's the lowest index and highest destination priority).
Solution attempt
The below works, but is not very numpy-ish. It'll be great if the loop could be eliminated.
# Initialise destination
result = np.zeros((route.shape[0]))
# Calculate flow by maintaining all dimensions (this will cause
# double ups because `x` is not part of `amount2`
temp = np.einsum('abc,yxabc->yxabc', amount2, route)
temp_ixs = np.asarray(np.where(temp))
# For each original amount, find the destination (`y`)
for a, b, c in zip(*np.where(amount2)):
# Find where dimensions `abc` are equal in the destination.
# Take the first vector which contains `yxabc` (we get `yx` as result)
ix = np.where((temp_ixs[2:].T == [a, b, c]).all(axis=1))[0][0]
y_ix = temp_ixs.T[ix][0]
# ignored
x_ix = temp_ixs.T[ix][1]
v = amount2[a, b, c]
# build resulting destination
result[y_ix] += v
# result == array([1000., 5000., 0., 0.])
With other words for each value in amount2
, I am looking for the lowest indices yx
in temp
so that the value can be written to result[y] = value
(x is ignored).
>>> temp = np.einsum('abc,yxabc->yx', amount2, route)
>>> temp
# +--- value=1000 at y=0 => result[0] += 1000
# /
array([[1000., 1000., 1000.],
# +--- value=5000 at y=1 => result[1] += 5000
# /
[5000., 0., 0.],
[ 0., 5000., 0.],
[ 0., 0., 5000.]])
>>> result
array([1000., 5000., 0., 0.])
>>> amount2
array([[[5000., 1000.]]])
Another attempt to reduce the dimensionality of route
is:
>>> r = route.any(1)
>>> for x in xrange(1, route.shape[0]):
r[x] = r[x] & (r[:x] == False).all(axis=0)
>>> np.einsum('abc,yabc->y', amount2, r)
array([1000., 5000., 0., 0.])
This essentially preserves above-mentioned priority given by the first dimension of route
. Any lower priority (higher index) array cannot contain a True value when a higher priority array has a value of True already at that sub index. While this is a lot better than my explicit approach, it would be great if the for x in xrange...
loop could be expressed as numpy vector operation.