How is the CDF ("pgamma") of The Average of 10 Samples from Two Gammas Derived?

Question

Case I: One Gamma ( I can do this! )

shape<-shape*10

scale<-scale/10

p_value_average_of_10_draws<-1-pgamma(q=average_of_10_draws, shape=shape, scale=scale, lower.tail = TRUE, log.p = FALSE)

Case II: Two Gammas (I can't do this!)

shape_A<-shape_A*10

scale_A<-scale_A/10

shape_B<-shape_B*10

scale_B<-scale_B/10

pgamma_A_and_B <-

pgamma(q=average_of_10_draws, shape=shape_A, scale=scale_A, lower.tail = TRUE, log.p = FALSE)*weight_A

+

pgamma(q=average_of_10_draws, shape=shape_B, scale=scale_B, lower.tail = TRUE, log.p = FALSE)*(1-weight_A)

p_value_average_of_10_draws<-1-pgamma_A_and_B

But this is just wrong!

Because it assumes that all ten draws will be taken from just one of A or B!

Severin Pappadeux · Answer 1 · 2016-04-18T21:20:38.827

0

Well, there is well-known rule how to make probability density function (PDF) of sum of two independent random variates (Z = X+Y) with their own PDFs

PDF(z) = S PDF_x(t) * PDF_y(z-t) dt

where S is integration sign. Not sure there is a generic expression for sum of gammas with any parameters. There are packages in R which does numeric convolution in the integral above.

Is that what you want?

UPDATE

K-S example test of two gammas

library(ggplot2)

fitted.pdf <- function(x, w, a1, s1, a2, s2) {
    w*dgamma(x, shape = a1, scale = s1) + (1.0-w)*dgamma(x, shape = a2, scale = s2)
}

fitted.cdf <- function(x, w, a1, s1, a2, s2) {
    w*pgamma(x, shape = a1, scale = s1) + (1.0-w)*pgamma(x, shape = a2, scale = s2)
}

p <- ggplot(data = data.frame(x = 0), mapping = aes(x = x))
p <- p + stat_function(fun = function(x) fitted.pdf(x, w=0.6, a1=1.0, s1=1.0, a2=0.8, s2=1.2))
p <- p + stat_function(fun = function(x) fitted.cdf(x, w=0.6, a1=1.0, s1=1.0, a2=0.8, s2=1.2))
p <- p + xlim(0.0, 4.0) + ylim(0.0, 1.0)
print(p)

# sample 100 from exponential
x <- rexp(100)

# K-S test
q <- ks.test(x, y=function(x) fitted.cdf(x, w=0.6, a1=1.0, s1=1.0, a2=0.8, s2=1.2))
print(q)

edited Apr 18 '16 at 21:20

answered Apr 17 '16 at 01:17

Severin Pappadeux

18,636
3
38
64

I see your point, that could be too complicated. Let me restate the problem this way: There are two sets of data. I fit a two gamma mixture model to the first. What is the probability that the second set of data could have been generated by the pdf composed of those two weighted gammas? – rwinkel2000 Apr 17 '16 at 18:29
@rwinkel2000 So, you build numerical PDF and fit it with something like `w_1*Gamma(a1,b1) + w2*Gamma(a2,b2)` ? (With `w1+w2=1` condition) – Severin Pappadeux Apr 17 '16 at 23:59
That's right. Now how to I calculate the probability that the second set of data could have been generated by that numerical PDF? – rwinkel2000 Apr 18 '16 at 01:28
@rwinkel2000 There are many ways to do that, one being histogram comparison. Another one which I like is to do Q-Q plot, which is basically CDF-to-CDF comparison. I have no **R** code, but take a look at very good description here: http://onlinestatbook.com/2/advanced_graphs/q-q_plots.html – Severin Pappadeux Apr 18 '16 at 02:28
@rwinkel2000 another link http://www.itl.nist.gov/div898/handbook/eda/section3/qqplot.htm – Severin Pappadeux Apr 18 '16 at 02:31
Which in your opinion is better: qq or k-s? https://stat.ethz.ch/R-manual/R-devel/library/stats/html/ks.test.html I'm looking at this ks link and trying to think of how I should represent a fitted mixture model (as opposed to a single pdf model) into this requirement: "y...either a numeric vector of data values, or a character string naming a cumulative distribution function or an actual cumulative distribution function such as pnorm." – rwinkel2000 Apr 18 '16 at 13:12
@rwinkel2000 There are many tests which could be used. Q-Q vs K-S are of different leagues. Q-Q is more of exploratory analysis, to visually verify if there are areas of disagreement. K-S is more of how to quantify difference, computing "distance" between distributions. You could use any or both – Severin Pappadeux Apr 18 '16 at 14:22
I need an actual p-value so I'd like to use K-S, but will the R implementation ks.test(...) let me test against a "custom" pdf? That's my final puzzle. – rwinkel2000 Apr 18 '16 at 17:14
@rwinkel2000 it should. I put some code in the update, test exponential distribution vs two weighted gammas – Severin Pappadeux Apr 18 '16 at 21:21
@rwinkel2000 Apparently, there is a code in **R** which does Q-Q comparison of smapled vs theoretical/fitted distribution. Take a look at http://www.inside-r.org/r-doc/lattice/qqmath – Severin Pappadeux May 03 '16 at 15:07

How is the CDF ("pgamma") of The Average of 10 Samples from Two Gammas Derived?

Case I: One Gamma ( I can do this! )

Case II: Two Gammas (I can't do this!)

1 Answers1

K-S example test of two gammas