From The Scipy Documentation
scipy.stats.multivariate_normal
scipy.stats.multivariate_normal = <scipy.stats._multivariate.multivariate_normal_gen object at 0x2b23194d1c90>
A multivariate normal random variable.
The
mean
keyword specifies the mean. Thecov
keyword specifies the covariance matrix.Parameters:
x
: array_like Quantiles, with the last axis of x denoting the components.
mean
: array_like, optional Mean of the distribution (default zero)
cov
: array_like, optional Covariance matrix of the distribution (default one)
allow_singular
: bool, optional Whether to allow a singular covariance matrix. (Default: False)
random_state
: None or int or np.random.RandomState instance, optional If int or RandomState, use it for drawing the random variates. If None (or np.random), the global np.random state is used. Default is None. Alternatively, the object may be called (as a function) to fix the mean and covariance parameters, returning a “frozen” multivariate normal random variable:rv = multivariate_normal(mean=None, cov=1, allow_singular=False)
Frozen object with the same methods but holding the given mean and covariance fixed.Notes
Setting the parameter mean to None is equivalent to having mean be the zero-vector. The parameter cov can be a scalar, in which case the covariance matrix is the identity times that value, a vector of diagonal entries for the covariance matrix, or a two-dimensional array_like. The covariance matrix cov must be a (symmetric) positive semi-definite matrix. The determinant and inverse of cov are computed as the pseudo-determinant and pseudo-inverse, respectively, so that cov does not need to have full rank.
My Implementation
mean_matrix = np.array([mu1['CS Score (USNews)'],mu2['Research Overhead %'],mu3['Admin Base Pay$'],mu4['Tuition(out-state)$']]);
print("Mean Matrix : ",mean_matrix);
logLikelihood = multivariate_normal.logpdf(data_frame_to_use, mean = mean_matrix, cov = covarianceMat, allow_singular='False');
print("Log matrix", logLikelihood);
print("Sum of log", sum(logLikelihood));
Which gives an output :
Mean Matrix :
[ 3.21428571e+00 5.33857143e+01 4.69178816e+05 2.97119592e+04]
Log matrix
[-25.89859216 -25.39255136 -24.90457203 -24.62044334 -25.97797326
-24.53094475 -24.86379124 -28.10541986 -25.17504371 -24.36097654
-27.56393633 -26.45706387 -24.73181091 -24.73103739 -25.35676354
-25.92874579 -27.37586004 -29.54768142 -24.49143024 -25.53990703
-25.57939464 -26.84501673 -25.33293111 -24.3236322 -24.62756871
-25.67609413 -26.81881766 -25.163922 -24.99671211 -24.94361195
-24.93544698 -24.72654802 -24.99845459 -27.3604362 -25.56750359
-26.8531682 -25.91679777 -27.4626466 -24.59908201 -27.17373079
-24.91116583 -26.78552165 -27.94191254 -25.32212942 -25.73247674
-26.51429465 -25.14545746 -24.43274555 -26.08543542]
Sum of log -1262.32720006
But when I apply the formula manually :
#Calculating PDF for multivariate condition by implementing formula
import xlrd
filepath = "./DataSet/university_data.xlsx"
workbook=xlrd.open_workbook(filepath)
sheet=workbook.sheet_by_index(0)
print("\n")
print("PDF values for each row :")
sum=0
sum1=0
for row in range(1,50):
sum_array=[] #Taking each row as input
sum_array=sheet.row_values(row,2,6)
#using formula implementation
l=np.subtract(sum_array,mean_matrix)
m=np.matrix.transpose(l)
n=np.linalg.inv(covarianceMat)
ex=np.exp(-0.5*np.dot(np.dot(m,n),l))
f=1/(pow(2*3.14,2)*pow(np.linalg.det(covarianceMat),0.5))
pdf=f*ex
# print("PDF for row ",row," \t:\t\t ",pdf)
lpdf=mt.log(pdf)
print("LogPDF for row ",row," \t:\t\t ",lpdf)
sum=sum+lpdf
print("\n")
print("Loglikelihood(formula implementation) : ",'%.3f' % float(sum))
Which gives the output :
> PDF values for each row :
>
> LogPDF for row 1 : -30.19371482325026
>
> LogPDF for row 2 : -27.067463935229377
>
> LogPDF for row 3 : -26.980621478218485
>
> LogPDF for row 4 : -26.08324487529128
>
> LogPDF for row 5 : -27.26413815992288
>
> LogPDF for row 6 : -26.70724095561742
>
> LogPDF for row 7 : -25.92834293415632
>
> LogPDF for row 8 : -28.712707460803784
>
> LogPDF for row 9 : -25.909899171974057
>
> LogPDF for row 10 : -25.92659040666956
>
> LogPDF for row 11 : -28.211108799821343
>
> LogPDF for row 12 : -26.83353978964118
>
> LogPDF for row 13 : -25.328814199145395
>
> LogPDF for row 14 : -25.111128380106702
>
> LogPDF for row 15 : -25.749393390363533
>
> LogPDF for row 16 : -26.693092607802864
>
> LogPDF for row 17 : -28.170941906181092
>
> LogPDF for row 18 : -29.927055551899382
>
> LogPDF for row 19 : -24.86790828974201
>
> LogPDF for row 20 : -26.137292962654886
>
> LogPDF for row 21 : -25.96539688885015
>
> LogPDF for row 22 : -27.35065704563797
>
> LogPDF for row 23 : -25.92368769313743
>
> LogPDF for row 24 : -24.70248175116109
>
> LogPDF for row 25 : -25.06362483723621
>
> LogPDF for row 26 : -26.268885215917194
>
> LogPDF for row 27 : -27.864039083501908
>
> LogPDF for row 28 : -25.57129689424411
>
> LogPDF for row 29 : -25.389635711130655
>
> LogPDF for row 30 : -25.328718626588117
>
> LogPDF for row 31 : -25.908499721053612
>
> LogPDF for row 32 : -25.274591177021158
>
> LogPDF for row 33 : -25.864730872696875
>
> LogPDF for row 34 : -28.250699466070667
>
> LogPDF for row 35 : -26.427070541801417
>
> LogPDF for row 36 : -28.480271709879336
>
> LogPDF for row 37 : -26.304600263886595
>
> LogPDF for row 38 : -29.079517786952714
>
> LogPDF for row 39 : -25.167192328059414
>
> LogPDF for row 40 : -27.552414021501523
>
> LogPDF for row 41 : -25.576316408257583
>
> LogPDF for row 42 : -27.164966750624057
>
> LogPDF for row 43 : -28.738620620446113
>
> LogPDF for row 44 : -25.85303976738355
>
> LogPDF for row 45 : -26.28781267577098
>
> LogPDF for row 46 : -27.08876652783593
>
> LogPDF for row 47 : -25.81071248417473
>
> LogPDF for row 48 : -25.77382834320652
>
> LogPDF for row 49 : -26.892236096254386
>
>
> Loglikelihood(formula implementation) : -1304.729
input Data : - The data set used in above
Python Notebook - My implementation