Which distribution fits data better?

Question

I use fitdistr in R to select which distribution fits my data best.

I've tried Cauchy, Weibull, normal, and Gamma distributions.

The log-likelihoods were: -329.8492 for Cauchy, -277.4931 for Gamma, -327.7622 for Normal, -279.0352 for Weibull.

Which one is the best? The one with the largest value (i.e., Gamma) or the one with the largest abs (i.e., Cauchy)?

This question appears to be off-topic because it is about statistics. It may be better fitted for http://stats.stackexchange.com — Barranka, Mar 16 '14 at 05:34
Check [this article](http://cran.r-project.org/doc/contrib/Ricci-distributions-en.pdf). Also, you should use some goodnes-of-fit tests to find out (Chi test and/or Kolmogorov-Smirnov test). Get a good book on statistics. — Barranka, Mar 16 '14 at 05:35
Well, if you accept that the evidence is strongest for the model with the greatest (i.e., least negative) likelihood, then the gamma distribution is the model which has the strongest evidence, among the ones you considered. But that's ignoring any prior information you have. For example, are the values only positive? If so, then the Cauchy and Gaussian models are impossible and you can exclude them a priori. More generally, think about the process which generated the data, and construct the model from that. — Robert Dodier, Mar 16 '14 at 21:09

score 10 · Answer 1 · answered Mar 16 '14 at 06:26

Voting to close, but a simple test will answer your question

set.seed(1)
# we know these data are normally distributed... 
dat <- rnorm(500,10,1)

# let's compute some fits...
require(MASS)
fits <- list(
 no = fitdistr(dat,"normal"),
 lo = fitdistr(dat,"logistic"),
 ca = fitdistr(dat,"cauchy"),
 we = fitdistr(dat, "weibull")
 )

# get the logliks for each model...
sapply(fits, function(i) i$loglik)

       no        lo        ca        we 
-718.3558 -722.1342 -806.2398 -741.2754

So the loglik that is the largest value is the one that indicates the best fit. We put in normally distributed data, and the loglik for the normal fit is the largest.

You might also find this image useful, from http://people.stern.nyu.edu/adamodar/pdfiles/papers/probabilistic.pdf

enter image description here

I think it is "logis" rather than "logistic" – ivo Welch Apr 12 '15 at 21:16 — ivo Welch, Apr 12 '15 at 21:16

Which distribution fits data better?

1 Answers1