How can I calculate probability for all each numpy value at once?

Question

I have a function for calculating probability like below:

def multinormpdf(x, mu, var): # calculate probability of multi Gaussian distribution
    k = len(x)
    det = np.linalg.det(var)
    inv = np.linalg.inv(var)
    denominator = math.sqrt(((2*math.pi)**k)*det)
    numerator = np.dot((x - mean).transpose(), inv)
    numerator = np.dot(numerator, (x - mean))
    numerator = math.exp(-0.5 * numerator)
    return numerator/denominator

and I have mean vector, covariance matrix and 2D numpy array for test

mu = np.array([100, 105, 42]) # mean vector
var = np.array([[100, 124, 11], # covariance matrix
               [124, 150, 44],
               [11, 44, 130]])

arr = np.array([[42, 234, 124],  # arr is 43923794 x 3 matrix
                [123, 222, 112],
                [42, 213, 11],
                ...(so many values about 40,000,000 rows),
                [23, 55, 251]])

I have to calculate for probability for each value, so I used this code

for i in arr:
    print(multinormpdf(i, mu, var)) # I already know mean_vector and variance_matrix

But it is so slow...

Is there any faster way to calculate probability? Or is there any way to calculate probability for test arr at once like 'batch'?

@Nils Werner I think that is not important. But I updated code for `normpdf` — YeongHwa Jin, Nov 22 '18 at 02:36
Your code is not valid Python code, and does not work, even after fixing the syntax issues. Please post a proper [MVCE](https://stackoverflow.com/help/mcve)! — Nils Werner, Nov 22 '18 at 12:09

Nils Werner · Accepted Answer · 2018-11-22T09:16:13.103

You can vectorize your function easily:

import numpy as np

def fast_multinormpdf(x, mu, var):
    mu = np.asarray(mu)
    var = np.asarray(var)
    k = x.shape[-1]
    det = np.linalg.det(var)
    inv = np.linalg.inv(var)
    denominator = np.sqrt(((2*np.pi)**k)*det)
    numerator = np.dot((x - mu), inv)
    numerator = np.sum((x - mu) * numerator, axis=-1)
    numerator = np.exp(-0.5 * numerator)
    return numerator/denominator


arr = np.array([[42, 234, 124],
                [123, 222, 112],
                [42, 213, 11],
                [42, 213, 11]])

mu = [0, 0, 1]
var = [[1, 100, 100],
       [100, 1, 100],
       [100, 100, 1]]

slow_out = np.array([multinormpdf(i, mu, var) for i in arr])
fast_out = fast_multinormpdf(arr, mu, var)

np.allclose(slow_out, fast_out) # True

With fast_multinormpdf being about 1000 times faster than your unvectorized function:

long_arr = np.tile(arr, (10000, 1))

%timeit np.array([multinormpdf(i, mu, var) for i in long_arr])
# 2.12 s ± 93.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit fast_multinormpdf(long_arr, mu, var)
# 2.56 ms ± 76.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Thanks Nils! I appreciate your advice! – YeongHwa Jin Nov 22 '18 at 13:24 — YeongHwa Jin, Nov 22 '18 at 13:24

score 1 · Answer 2 · answered Nov 21 '18 at 20:28

1

You can try numba. Just decorate your function with @numba.vectorize.

@numba.vectorize
def multinormpdf(x, mu, var):
    # ...
    return caculated_probability

new_arr = multinormpdf(arr)

If your multinormpdf doesn't contains any unsupported functions, it can be accelerated. See here: https://numba.pydata.org/numba-doc/dev/reference/numpysupported.html

Moreover, you can use the experimental feature target='parallel' like this.

@numba.vectorize(target='parallel')

answered Nov 21 '18 at 20:28

anch2150

81
6

My input for `multinormpdf` is already numpy array (like [42, 234, 124] or [123, 222, 112]), not scalar(So maybe function is like `multinormpdf([51, 23 ,251], mu_vector, cov_matrix)`). Can I use @numba.vectorize yet? – YeongHwa Jin Nov 22 '18 at 03:41

How can I calculate probability for all each numpy value at once?

2 Answers2