0

I have a gamma distribution fit to my data using libary(fitdistrplus). I need to determine a method for defining the range of x values that can be "reasonably" expected, analogous to using standard deviations with normal distributions.

For example, x values within two standard deviations from the mean could be considered to be the reasonable range of expected values from a normal distribution. Any suggestions for how to define a similar range of expected values based on the shape and rate parameters of a gamma distribution?

...maybe something like identifying the two values of x that between which contains 95% of the data?

viridius
  • 477
  • 5
  • 17
  • As far as I know (which isn't very far) when working outside of normal, if you want to define standard deviations you need to normalize the data for them to be meaningful. Otherwise you can utilize percentiles to obtain the values that bound 95 percent of the data with the understand that they function differently than a standard deviation from a probability standpoint. – Badger Oct 09 '15 at 21:40

2 Answers2

1

The mean expected value of a gamma is:

E[X] = k * theta  

The variance is Var[X] = k * theta^2 where, k is shape and theta is scale.

But typically I would use 95% quantiles to indicate data spread.

Orest Hera
  • 6,706
  • 2
  • 21
  • 35
MC Kwit
  • 9
  • 3
1

Let's assume we have a random variable that is gamma distributed with shape alpha=2 and rate beta=3. We would expect this distribution to have mean 2/3 and standard deviation sqrt(2)/3, and indeed we see this in simulated data:

mean(rgamma(100000, 2, 3))
# [1] 0.6667945
sd(rgamma(100000, 2, 3))
# [1] 0.4710581
sqrt(2) / 3
# [1] 0.4714045

It would be pretty weird to define confidence ranges as [mean - gamma*sd, mean + gamma*sd]. To see why, consider if we selected gamma=2 in the example above. This would yield confidence range [-0.276, 1.609], but the gamma distribution can't even take on negative values, and 4.7% of data falls above 1.609. This is at the very least not a well balanced confidence interval.

A more natural choice might by to take the 0.025 and 0.975 percentiles of the distribution as a confidence range. We would expect 2.5% of data to fall below this range and 2.5% of data to fall above the range. We can use qgamma to determine that for our example parameters the confidence range would be [0.081, 1.857].

qgamma(c(0.025, 0.975), 2, 3)
# [1] 0.08073643 1.85721446
josliber
  • 43,891
  • 12
  • 98
  • 133