Efficient way to create the probability distribution of a list of numbers with numpy

Question

This is an example of what I am trying to do. Suppose the following numpy array:

A = np.array([3, 0, 1, 5, 7]) # in practice, this array is a huge array of float numbers: A.shape[0] >= 1000000

I need the fastest possible way to get the following result:

result = []

for a in A:
    result.append( 1 / np.exp(A - a).sum() )

result = np.array(result)

print(result)

>>> [1.58297157e-02 7.88115138e-04 2.14231906e-03 1.16966657e-01 8.64273193e-01]

Option 1 (faster than previous code):

result = 1 / np.exp(A - A[:,None]).sum(axis=1)

print(result)

>>> [1.58297157e-02 7.88115138e-04 2.14231906e-03 1.16966657e-01 8.64273193e-01]

Is there a faster way to get "result" ?

To be clear: the goal is that the values in `result` sum to 1, and each is proportional to e to the power of the corresponding original value? — Karl Knechtel, Jan 25 '22 at 21:50
Yes. result sum to 1. But I cannot directly do the exponentials because the values in A are very large — isedgar, Jan 25 '22 at 21:55
Are you looking for something like this: https://docs.scipy.org/doc/scipy/reference/generated/scipy.special.softmax.html — j1-lee, Jan 25 '22 at 21:56
"But I cannot directly do the exponentials because the values in A are very large" Oh, I answered too quickly, then. Are they within some small range of each other? Can you subtract a constant offset first? — Karl Knechtel, Jan 25 '22 at 21:58

score 3 · Answer 1 · answered Jan 25 '22 at 21:57

3

Rather than trying to compute each value by normalizing it in place (effectively adding up all the values, repeatedly for each value), instead just get the exponentials and then normalize once at the end. So:

raw = np.exp(A)
result = A / sum(A)

(In my testing, the builtin sum is over 2.5x as fast as np.sum for summing a small array. I did not test with larger ones.)

answered Jan 25 '22 at 21:57

Karl Knechtel

62,466
11
102
153

I cannot do np.exp(A) because the values in A are huge for np.exp; it gives me an overflow. scipy.special.softmax handle that issue. Thank you. – isedgar Jan 25 '22 at 22:33

score 0 · Accepted Answer · answered Jan 25 '22 at 22:30

0

Yes: scipy.special.softmax did the trick

from scipy.special import softmax

result = softmax(A)

Thank you @j1-lee and @Karl Knechtel

answered Jan 25 '22 at 22:30

isedgar

43
6

Efficient way to create the probability distribution of a list of numbers with numpy

2 Answers2