3

Trying to fit a chi_square distribution using fitdistr() in R. Documentation on this is here (and not very useful to me): https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/fitdistr.html

Question 1: chi_df below has the following output: 3.85546875 (0.07695236). What is the second number? The Variance or standard deviation?

Question 2: fitdistr generates 'k' defined by the Chi-SQ distribution. How do I fit the data so I get the scaling constant 'A'? I am dumbly using lines 14-17 below. Obviously not good.

Question 3: Is the Chi-SQ distribution only defined for a certain x-range? (Variance is defined as 2K, while mean = k. This must require some constrained x-range... Stats question not programming...)

nnn = 1000;
## Generating a chi-sq distribution
chii <- rchisq(nnn,4, ncp = 0);  
## Plotting Histogram
chi_hist <- hist(chii);   
## Fitting. Gives probability density which must be scaled.
chi_df <- fitdistr(chii,"chi-squared",start=list(df=3)); 
chi_k <- chi_df[[1]][1];

## Plotting a fitted line:
## Spanning x-length of chi-sq data
x_chi_fit <- 1:nnn*((max(chi_hist[[1]][])-min(chi_hist[[1]][]))/nnn);

## Y data using eqn for probability function
y_chi_fit <- (1/(2^(chi_k/2)*gamma(chi_k/2)) * x_chi_fit^(chi_k/2-1) * exp(-x_chi_fit/2));
## Normalizing to the peak of the histogram
y_chi_fit <- y_chi_fit*(max(chi_hist[[2]][]/max(y_chi_fit)));

## Plotting the line
lines(x_chi_fit,y_chi_fit,lwd=2,col="green");

Thanks for your help!

Entropy
  • 133
  • 2
  • 12
  • 2
    1) I gather the second value is a standard deviation around the estimate, as per `str(chi_df)` and the `Value:` section of the help file. – thelatemail Mar 03 '15 at 22:05
  • 1
    “Documentation on this is terrible” – can you clarify this? I find the documentation quite nice. For instance, it more or indirectly answers your first question. – Konrad Rudolph Mar 04 '15 at 14:30
  • @rawr There’s no need for those semicolons. In fact, **do not use semicolons in R** – Konrad Rudolph Mar 04 '15 at 14:30
  • 3
    @rawr's first comment was sarcastic, I think. The second is obscurely pointing this out (I think -- it's even more obscure to me). (Note: sarcasm/irony don't necessarily translate very well in on-line contexts, especially international ones ...) – Ben Bolker Mar 04 '15 at 15:25
  • precisely, Prof. Bolker – rawr Mar 04 '15 at 15:33
  • @rawr Okay, thanks for clarifying. The thing is, OP obviously didn’t get this, which is why I found it necessary to point this out. – Konrad Rudolph Mar 04 '15 at 16:26
  • In my comment on the documentation, I was merely referring to the variety of parameters and syntax that fitdistr() takes. For example, it took me over an hour to figure out how the syntax for 'start=list(df=3)'. It would be nice if the documentation covered all the parameter inputs/ examples. That said, overall I find it satisfactory for the many other times I've used it. – Entropy Mar 04 '15 at 19:29

1 Answers1

6
  1. As commented above, ?fitdistr says

An object of class ‘"fitdistr"’, a list with four components, ... sd: the estimated standard errors,

... so that parenthesized number is the standard error of the parameter.

  1. The scale parameter doesn't need to be estimated; you need either to scale by the width of your histogram bins or just use freq=FALSE when drawing your histogram. See code below.

  2. The chi-squared distribution is defined on the non-negative reals, which makes sense since it's the distribution of a squared standard Normal (this is a statistical, not a programming question).

Set up data:

nnn <- 1000
## ensure reproducibility; not a big deal in this case,
##  but good practice
set.seed(101)
## Generating a chi-sq distribution
chii <- rchisq(nnn,4, ncp = 0)  

Fitting.

library(MASS)
## use method="Brent" based on warning
chi_df <- fitdistr(chii,"chi-squared",start=list(df=3),
                   method="Brent",lower=0.1,upper=100)
chi_k <- chi_df[[1]][1]

(For what it's worth, it looks like there might be a bug in the print method for fitdistr when method="Brent" is used. You could also use method="BFGS" and wouldn't need to specify bounds ...)

Histograms

chi_hist <- hist(chii,breaks=50,col="gray")
## scale by N and width of histogram bins
curve(dchisq(x,df=chi_k)*nnn*diff(chi_hist$breaks)[1],
      add=TRUE,col="green")
## or plot histogram already scaled to a density
chi_hist <- hist(chii,breaks=50,col="gray",freq=FALSE)   
curve(dchisq(x,df=chi_k),add=TRUE,col="green")
Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • Clarifying Question 1: Standard error as in StDev/sqrt(counts)? freq=FALSE is just what I was looking for. Thank you so so much, you all here are seriously the best! – Entropy Mar 04 '15 at 19:20
  • Additionally, how do I get a proper fit if I scale 'rchisq' along the X-axis?: `chii <- 100*rchisq(nnn,4, ncp = 0) ` – Entropy Mar 04 '15 at 20:00
  • the standard error of the parameters is computed by estimating the curvature of the log-likelihood surface at the maximum log-likelihood estimate -- not the same as computing the standard error of a mean by scaling the sample standard deviation by sqrt(counts) [you asked ...] If your values are scaled, that's a harder problem that is going to require a bit of custom coding. Perhaps another question. – Ben Bolker Mar 04 '15 at 23:19