Calculation/manipulation of numpy array

Question

Looking to make the this calculation as quickly as possible. I have X as n x m numpy array. I want to define Y to be the following:

Y_11 = 1 / (exp(X_11-X_11) + exp(X_11-X_12) + ... exp(X_11 - X_1N) ).

or for Y_00

1/np.sum(np.exp(X[0,0]-X[0,:]))

So basically, Y is also n x m where the i,j element is the 1 / sum_j' exp(X_ij - X_ij')

Any tips would be great! Thanks.

Sample code as requested:

np.random.seed(111)
J,K = 110,120
X = np.random.rand(J,K)
Y = np.zeros((J,K))
for j in range(J):
    for k in range(K):
        Y[j,k] = 1/np.sum(np.exp(X[j,k]-X[j,:]))

# note each row will sum to 1 under this operation
np.sum(Y,axis=1)

Thanks for spotting that I dropped the exp(). I also added some code. — Kevin, Sep 19 '16 at 17:52
This is now a good question. In the future, it would be great if you could write stuff upfront like it is now. — Ami Tavory, Sep 19 '16 at 18:35

Divakar · Accepted Answer · 2016-09-19T20:40:15.603

Here are few fully vectorized approaches -

def vectorized_app1(X):
    return 1/np.exp(X[:,None] - X[...,None]).sum(1)

def vectorized_app2(X):
    exp_vals = np.exp(X)
    return 1/(exp_vals[:,None]/exp_vals[...,None]).sum(1)

def vectorized_app3(X):
    exp_vals = np.exp(X)
    return 1/np.einsum('ij,ik->ij',exp_vals,1/exp_vals)

Tricks used and lessons learnt

Extend dimensions with None/np.newaxis to bring in broadcasting and perform everything in a vectorized manner.
np.exp is an expensive operation. So, using it on broadcasted huge array would be costly. So, the trick used is : exp(A-B) = exp(A)/exp(B). Thus, we perform np.exp(X) upfront and then perform broadcasted divisions.
Those sum-reductions could be implemented with np.einsum. This brings in memory-efficiency as we won't have to create huge broadcasted arrays and this effects in massive performance boost.

Runtime test -

In [111]: # Setup input, X
     ...: np.random.seed(111)
     ...: J,K = 110,120
     ...: X = np.random.rand(J,K)
     ...: 

In [112]: %timeit original_approach(X)
1 loop, best of 3: 322 ms per loop

In [113]: %timeit vectorized_app1(X)
10 loops, best of 3: 124 ms per loop

In [114]: %timeit vectorized_app2(X)
100 loops, best of 3: 14.6 ms per loop

In [115]: %timeit vectorized_app3(X)
100 loops, best of 3: 3.01 ms per loop

Looks like einsum showed its magic again with 100x+ speedup!

Thank you! One issue however is sometimes A or B has large values so I get an error. My thought about going with exp(A-B) is that I shrink a lot of them and can put in an np.min(A-B, 200) or similar to avoid errors. Any other ideas? — Kevin, Sep 20 '16 at 14:43
overflow in exp. I just discovered bigfloat but will need to spend time seeing how to implement with my code -- and see if it works with knitro. — Kevin, Sep 20 '16 at 15:36
@Kevin Hmm or I guess you can stick to `vectorized_app1` for those cases. — Divakar, Sep 20 '16 at 15:55

hpaulj · Answer 2 · 2016-09-19T20:28:20.440

2

Here's a first step at reducing the double loop:

def foo2(X):
    Y = np.zeros_like(X)
    for k in range(X.shape[1]):
        Y[:,k]=1/np.exp(X[:,[k]]-X[:,:]).sum(axis=1)
    return Y

I suspect I can also remove the k loop, but I have to spend more time figuring out just what it is doing. That X[:,[k]]-X[:,:] isn't obvious (in the big picture).

Another step:

Z = np.stack([X[:,[k]]-X for k in range(X.shape[1])],2)
Y = 1/np.exp(Z).sum(axis=1)

which can be further refined (with too much trial and error) with

Z = X[:,None,:]-X[:,:,None]

edited Sep 19 '16 at 20:28

answered Sep 19 '16 at 19:20

hpaulj

221,503
14
230
353

Great answer. Very minor comment: in your last parentheses, did you possibly mean "*without* too much trial and error". – Ami Tavory Sep 20 '16 at 07:16
No, I tried a number of alternatives. There are various ways in which `X` can be expanded to 3d. And 3 axis that it can be summed on. I wasn't as systematic as I could have been. – hpaulj Sep 20 '16 at 07:36

Calculation/manipulation of numpy array

2 Answers2