5

I'm trying to fit a multivariate normal distribution to data that I collected, in order to take samples from it. I know how to fit a (univariate) normal distribution, using the fitdist function (with the 'Normal' option).

How can I do something similar for a multivariate normal distribution?

Doesn't using fitdist on every dimension separately assumes the variables are uncorrelated?

Shaked
  • 475
  • 1
  • 3
  • 12

3 Answers3

1

There isn't any need for a specialized fitting function; the maximum likelihood estimates for the mean and variance of the distribution are just the sample mean and sample variance. I.e., compute the sample mean and sample variance and you're done.

Robert Dodier
  • 16,905
  • 2
  • 31
  • 48
  • I have had much better results with fitdist on univariate data than with manual calculation of mean and variance. – Masterfool Feb 23 '16 at 17:28
  • @Masterfool I'm honestly curious to know what you mean by "better results". What is `fitdist` returning, if not the sample mean and variance? – Robert Dodier Feb 23 '16 at 19:37
  • Big caveat: I mistook matlab for R. With that said, [fitdist](http://www.inside-r.org/packages/cran/fitdistrplus/docs/fitdist) can use any of the methods in the Details section of that link. method "mme" uses sample mean and variance, but the others use some kind of numerical optimization. The fitted parameters produced, for me, a better fit to the sample histogram. My understanding is rusty, but I suppose the sample mean and variance are not actually a mle of the parameters, and higher-likelihood params can be found via numerical optimization. – Masterfool Feb 24 '16 at 01:17
  • 1
    @Masterfool Thanks for the update. Sample mean and variance are mle for distribution mean and variance for a normal distribution, and OP did mention the normal distribution specifically. But I agree, if you broaden the search to look at other types of distributions, then in general you'll need something more than sample mean and variance. – Robert Dodier Feb 24 '16 at 06:14
  • My mistake; indeed I didn't consider that your comment was true for the normal distribution in particular. – Masterfool Feb 24 '16 at 20:35
0

Estimate the mean with mean and the variance-covariance matrix with cov. Then you can generate random numbers with mvnrnd. It is also possible to use fitmgdist, but for just a multivariate normal distribution mean and cov are enough.

Yes, using fitdist on every dimension separately assumes the variables are uncorrelated and it's not what you want.

FrancescoVe
  • 1,060
  • 1
  • 7
  • 12
0

You can use [sigma,mu] = robustcov(X) function, where X is your multivariate data, i.e. X = [x1 x2 ... xn] and xi is a column vector data.

Then you can use Y = mvnpdf(X,mu,sigma) to get the values of the estimated normal probability density function.

https://www.mathworks.com/help/stats/normfit.html https://www.mathworks.com/help/stats/mvnpdf.html