0

I am trying to compute a log-likelihood, where the likelihood is a convolution of two distributions, and the distribution parameters depend on the value of that data point. More formally, I believe my data follows a likelihood which is the sum of two gamma distributions, which have separate shape and scale parameters and the shape parameter is a function of data. Written mathematically, yi=pGamma(k1+b1 xi, t1)+(1-p)Gamma(k2+b2 xi, t2). Note the plus sign is an arithmetic operation on the random variables, not between the CDFs. This is not a mixture model. The likelihood is a convolution between weighted random variables that have to be done per observation, which is very slow.

I currently am using the distr package, and this is what I have to compute the likelihood.

  sapply(1:10000, function(i){distr::d(p*Gammad(shape = k1+b1*x[i], scale = t1)+(1-p)*Gammad(shape = k2+b2*x[i], scale = t2))(y[i])})

Is there a way to vectorize this? Is there any other way to speed up the code?

When I try

distr::d(Vectorize(p*Gammad(shape = k1+b1*x, scale = t1)+(1-p)*Gammad(shape = k2+b2*x, scale = t2)))(y)

I get the error "Error in .multm(e1, e2, "AbscontDistribution") : length of operator must be 1"

1 Answers1

0

First, note that if

A ~ Gamma(k1 + b1*xi, t1)
B ~ Gamma(k2 + b2*xi, t2)

then

p*A ~ Gamma(k1 + b1*xi, p*t1)
(1 - p)*B ~ Gamma(k2 + b2*xi, (1 - p)*t2)

The closed-form PDF of the sum of two gamma random variates is known and involves Kummer's (confluent hypergeometric) function.

Using chf_1F1 from the scModels package, I believe the following function would return the log-likelihoods.

library(scModels)

LL <- function(k1, b1, t1, k2, b2, t2, x, p, y) {
  a1 <- k1 + b1*x
  a2 <- k2 + b2*x
  a <- a1 + a2
  b1 <- 1/(p*t1)
  b2 <- 1/((1 - p)*t2)
  a1*log(b1) + a2*log(b2) - lgamma(a) - b1*y + (a - 1)*log(y) + chf_1F1((b1 - b2)*y, a2, a)
}
jblood94
  • 10,340
  • 1
  • 10
  • 15