multivariate_normal() : Why does the manual implementation and using scipy function for multivariate vary in results

Question

From The Scipy Documentation

scipy.stats.multivariate_normal

scipy.stats.multivariate_normal = <scipy.stats._multivariate.multivariate_normal_gen object at 0x2b23194d1c90>

A multivariate normal random variable.

The mean keyword specifies the mean. The cov keyword specifies the covariance matrix.

Parameters:
x : array_like Quantiles, with the last axis of x denoting the components.

mean : array_like, optional Mean of the distribution (default zero)

cov : array_like, optional Covariance matrix of the distribution (default one)

allow_singular : bool, optional Whether to allow a singular covariance matrix. (Default: False)

random_state : None or int or np.random.RandomState instance, optional If int or RandomState, use it for drawing the random variates. If None (or np.random), the global np.random state is used. Default is None. Alternatively, the object may be called (as a function) to fix the mean and covariance parameters, returning a “frozen” multivariate normal random variable: rv = multivariate_normal(mean=None, cov=1, allow_singular=False) Frozen object with the same methods but holding the given mean and covariance fixed.

Notes

Setting the parameter mean to None is equivalent to having mean be the zero-vector. The parameter cov can be a scalar, in which case the covariance matrix is the identity times that value, a vector of diagonal entries for the covariance matrix, or a two-dimensional array_like. The covariance matrix cov must be a (symmetric) positive semi-definite matrix. The determinant and inverse of cov are computed as the pseudo-determinant and pseudo-inverse, respectively, so that cov does not need to have full rank.

My Implementation

mean_matrix = np.array([mu1['CS Score (USNews)'],mu2['Research Overhead %'],mu3['Admin Base Pay$'],mu4['Tuition(out-state)$']]);
print("Mean Matrix : ",mean_matrix); 

logLikelihood = multivariate_normal.logpdf(data_frame_to_use, mean = mean_matrix, cov = covarianceMat, allow_singular='False');
print("Log matrix", logLikelihood);
print("Sum of log", sum(logLikelihood));

Which gives an output :

 Mean Matrix :  
 [  3.21428571e+00   5.33857143e+01   4.69178816e+05   2.97119592e+04]

 Log matrix 
 [-25.89859216 -25.39255136 -24.90457203 -24.62044334 -25.97797326
  -24.53094475 -24.86379124 -28.10541986 -25.17504371 -24.36097654
  -27.56393633 -26.45706387 -24.73181091 -24.73103739 -25.35676354
  -25.92874579 -27.37586004 -29.54768142 -24.49143024 -25.53990703
  -25.57939464 -26.84501673 -25.33293111 -24.3236322  -24.62756871
  -25.67609413 -26.81881766 -25.163922   -24.99671211 -24.94361195
  -24.93544698 -24.72654802 -24.99845459 -27.3604362  -25.56750359
  -26.8531682  -25.91679777 -27.4626466  -24.59908201 -27.17373079
  -24.91116583 -26.78552165 -27.94191254 -25.32212942 -25.73247674
  -26.51429465 -25.14545746 -24.43274555 -26.08543542]

 Sum of log -1262.32720006

But when I apply the formula manually :

    #Calculating PDF for multivariate condition by implementing formula
import xlrd
filepath = "./DataSet/university_data.xlsx"
workbook=xlrd.open_workbook(filepath)
sheet=workbook.sheet_by_index(0)

print("\n")
print("PDF values for each row :")
sum=0
sum1=0
for row in range(1,50):
    sum_array=[] #Taking each row as input
    sum_array=sheet.row_values(row,2,6)
    #using formula implementation
    l=np.subtract(sum_array,mean_matrix)
    m=np.matrix.transpose(l)
    n=np.linalg.inv(covarianceMat)
    ex=np.exp(-0.5*np.dot(np.dot(m,n),l))
    f=1/(pow(2*3.14,2)*pow(np.linalg.det(covarianceMat),0.5))
    pdf=f*ex
#     print("PDF for row ",row," \t:\t\t ",pdf)
    lpdf=mt.log(pdf)
    print("LogPDF for row ",row," \t:\t\t ",lpdf)
    sum=sum+lpdf

print("\n")
print("Loglikelihood(formula implementation) : ",'%.3f' % float(sum))

Which gives the output :

>      PDF values for each row :
> 
> LogPDF for row  1     :         -30.19371482325026
> 
> LogPDF for row  2     :         -27.067463935229377
> 
> LogPDF for row  3     :         -26.980621478218485
> 
> LogPDF for row  4     :         -26.08324487529128
> 
> LogPDF for row  5     :         -27.26413815992288
> 
> LogPDF for row  6     :         -26.70724095561742
> 
> LogPDF for row  7     :         -25.92834293415632
> 
> LogPDF for row  8     :         -28.712707460803784
> 
> LogPDF for row  9     :         -25.909899171974057
> 
> LogPDF for row  10    :         -25.92659040666956
> 
> LogPDF for row  11    :         -28.211108799821343
> 
> LogPDF for row  12    :         -26.83353978964118
> 
> LogPDF for row  13    :         -25.328814199145395
> 
> LogPDF for row  14    :         -25.111128380106702
> 
> LogPDF for row  15    :         -25.749393390363533
> 
> LogPDF for row  16    :         -26.693092607802864
> 
> LogPDF for row  17    :         -28.170941906181092
> 
> LogPDF for row  18    :         -29.927055551899382
> 
> LogPDF for row  19    :         -24.86790828974201
> 
> LogPDF for row  20    :         -26.137292962654886
> 
> LogPDF for row  21    :         -25.96539688885015
> 
> LogPDF for row  22    :         -27.35065704563797
> 
> LogPDF for row  23    :         -25.92368769313743
> 
> LogPDF for row  24    :         -24.70248175116109
> 
> LogPDF for row  25    :         -25.06362483723621
> 
> LogPDF for row  26    :         -26.268885215917194
> 
> LogPDF for row  27    :         -27.864039083501908
> 
> LogPDF for row  28    :         -25.57129689424411
> 
> LogPDF for row  29    :         -25.389635711130655
> 
> LogPDF for row  30    :         -25.328718626588117
> 
> LogPDF for row  31    :         -25.908499721053612
> 
> LogPDF for row  32    :         -25.274591177021158
> 
> LogPDF for row  33    :         -25.864730872696875
> 
> LogPDF for row  34    :         -28.250699466070667
> 
> LogPDF for row  35    :         -26.427070541801417
> 
> LogPDF for row  36    :         -28.480271709879336
> 
> LogPDF for row  37    :         -26.304600263886595
> 
> LogPDF for row  38    :         -29.079517786952714
> 
> LogPDF for row  39    :         -25.167192328059414
> 
> LogPDF for row  40    :         -27.552414021501523
> 
> LogPDF for row  41    :         -25.576316408257583
> 
> LogPDF for row  42    :         -27.164966750624057
> 
> LogPDF for row  43    :         -28.738620620446113
> 
> LogPDF for row  44    :         -25.85303976738355
> 
> LogPDF for row  45    :         -26.28781267577098
> 
> LogPDF for row  46    :         -27.08876652783593
> 
> LogPDF for row  47    :         -25.81071248417473
> 
> LogPDF for row  48    :         -25.77382834320652
> 
> LogPDF for row  49    :         -26.892236096254386
> 
> 
> Loglikelihood(formula implementation) :  -1304.729

input Data : - The data set used in above

Python Notebook - My implementation

π ≠ 3.14. I haven't checked if the difference is enough to explain your results, but you should definitely use a more accurate approximation, such as `math.pi` or `numpy.pi`. — Warren Weckesser, Sep 24 '17 at 22:48
Really need a [mcve] to help much more than that I think. Currently "minimal" doesn't really apply, and "complete" (as in, what formula are you trying to implement) is questionable. Also, I edited the docs to be more readable, but you could probably trim parts that aren't necessary (or remove them entirely if you aren't referencing them) in order to make what you're looking for a little more clear. — Daniel F, Sep 25 '17 at 06:15
Any luck with this ? I also have different values between my custom implementation and the scipy implementation — Xavier Bourret Sicotte, Jun 19 '18 at 14:49

multivariate_normal() : Why does the manual implementation and using scipy function for multivariate vary in results

0 Answers0