3

Suppose that we have this gamma distribution in MATLAB:

enter image description here

I want this part of distribution with higher density (x-axis range). How can I extract this in MATLAB? I fit this distribution using histfit function.

enter image description here

PS. My codes:

figure;
histfit(Data,20,'gamma');
    [phat, pci] = gamfit(Data);

phat =

   11.3360    4.2276

pci =

    8.4434    3.1281
   15.2196    5.7136
Eghbal
  • 3,892
  • 13
  • 51
  • 112
  • 3
    how do you define the 'part with higher density'? Have you looked at [`betainv`](http://www.mathworks.com/help/stats/betainv.html)? – zeeMonkeez Dec 02 '15 at 14:52
  • @zeeMonkeez. I want calculate this range using percentage or something like this. For example range of 40% of data with highest density. – Eghbal Dec 02 '15 at 14:53
  • 1
    So it really sounds like a use for `betainv`. The man page even has an example doing exactly what you want. – zeeMonkeez Dec 02 '15 at 14:55
  • @zeeMonkeez. Sorry. My distribution is `gamma`. – Eghbal Dec 02 '15 at 15:04
  • 2
    Let me guess. In that case, it would be [`gaminv`](http://www.mathworks.com/help/stats/gaminv.html) – zeeMonkeez Dec 02 '15 at 15:11
  • @zeeMonkeez Can you prepare a short example in answers? There are to other inputs (`pcov` and `alpha`) here to specify. I think a short example of codes really helps. – Eghbal Dec 02 '15 at 15:13
  • 2
    How about you post some code. All I could post for examples would be pretty much what the documentation already has. – zeeMonkeez Dec 02 '15 at 15:15
  • @zeeMonkeez. Question edited. thanks. – Eghbal Dec 02 '15 at 15:17

2 Answers2

2

When you fit a gamma distribution to your data with [phat, pci] = gamfit(Data);, phat contains the MLE parameters. You can plug this into gaminv:

x = gaminv(p, phat(1), phat(2));

where p is a vector of percentages, e.g. p = [.2, .8].

zeeMonkeez
  • 5,057
  • 3
  • 33
  • 56
  • thank you for answer. the output for x is `35.72` and `59.311`. What is meaning of these values here? – Eghbal Dec 02 '15 at 15:25
  • These are the values on the `x` axis, corresponding to the 20th and 80th percentile. – zeeMonkeez Dec 02 '15 at 15:26
  • So here 20% of data are lower than 35.7 and 80% of data are lower than 59.3. Is this true? and 60% of data are between 35.7 and 59.3. – Eghbal Dec 02 '15 at 15:34
1

I always base my code on the following. This is a code once made which I now alter where necessary. Maybe you will find it usefull as well.

table2.10 <- cbind(lower=c(0,2.5,7.5,12.5,17.5,22.5,32.5,
                           47.5,67.5,87.5,125,225,300),upper=c(2.5,7.5,12.5,17.5,22.5,32.5,
                                                               47.5,67.5,87.5,125,225,300,Inf),freq=c(41,48,24,18,15,14,16,12,6,11,5,4,3))
loglik <-function(p,d){
  upper <- d[,2]
  lower <- d[,1]
  n <- d[,3]
  ll<-n*log(ifelse(upper<Inf,pgamma(upper,p[1],p[2]),1)-
              pgamma(lower,p[1],p[2]))
  sum( (ll) )
}

p0 <- c(alpha=0.47,beta=0.014)
m <- optim(p0,loglik,hessian=T,control=list(fnscale=-1),
           d=table2.10)

theta <- qgamma(0.95,m$par[1],m$par[2])
theta

One can also create a 95% confidence interval by using the Delta method. To do so, we need to differentiate the distribution function of the claims F^(−1)_{X} (0.95; α, β) with respect to α and β.

p <- m$par
eps <- c(1e-5,1e-6,1e-7)
d.alpha <- 0*eps
d.beta <- 0*eps
for (i in 1:3){
d.alpha[i] <- (qgamma(0.95,p[1]+eps[i],p[2])-qgamma(0.95,p[1]-eps[i],p[2]))/(2*eps[i])
d.beta[i] <- (qgamma(0.95,p[1],p[2]+eps[i])-qgamma(0.95,p[1],p[2]-eps[i]))/(2*eps[i])
}
d.alpha

d.beta

var.p <- solve(-m$hessian)
var.q95 <- t(c(d.alpha[2],d.beta[2])) %*% var.p %*% c(d.alpha[2],d.beta[2])
qgamma(0.95,p[1],p[2]) + qnorm(c(0.025,0.975))*sqrt(c(var.q95))

It is even possible using the parametric bootstrap on the estimates for α and β to get B = 1000 different estimates for the 95th percentile of the loss distribution. And use these estimates to construct a 95% confidence interval

library(mvtnorm)
B <- 10000
q.b <- rep(NA,B)
for (b in 1:B){
p.b <- rmvnorm(1,p,var.p)
if (!any(p.b<0)) q.b[b] <- qgamma(0.95,p.b[1],p.b[2])
}

quantile(q.b,c(0.025,0.975))

To do the nonparametrtic bootstrap, we first ‘expand’ the data to reflect each individual observation. Then we sample with replacement from the line numbers, calculate the frequency table, estimate the model and its 95% percentile.

line.numbers <- rep(1:13,table2.10[,"freq"])
q.b <- rep(NA,B)
table2.10b <- table2.10
for (b in 1:B){
line.numbers.b <- sample(line.numbers,size=217,replace=TRUE)
table2.10b[,"freq"] <- table(factor(line.numbers.b,levels=1:13))
m.b <- optim(m$par,loglik,hessian=T,control=list(fnscale=-1),
d=table2.10b)
q.b[b] <- qgamma(0.95,m.b$par[1],m.b$par[2])
}
q.npb <- q.b
quantile(q.b,c(0.025,0.975))