Coding likelihood and log-likelihood function in r to perform optimization

Question

I am working on a paper that requires me to find the MLE of Gumbel’s type I bivariate exponential distribution. I have proved the likelihood and log-likelihood functions likelihood and log-likelihood but I am struggling to implement it in r to perform optimization with Optim function. My code generates NA values. Below are my codes.

# likelihood function of x
likelihood.x = function(params, data) {
  lambda1 = params[1]
  lambda2 = params[2]
  theta = params[3]
  A = (1 - theta) * (lambda1 * lambda2)
  B = theta * (lambda1 ^ 2) * lambda2 * data$X1
  C = theta * lambda1 * (lambda2 ^ 2) * data$X2
  D = (theta ^ 2) * (lambda1 ^ 2) * (lambda2 ^ 2) * data$X1 * data$X2
  E = (lambda1 * data$X1) + (lambda2 * data$X2) + (theta * lambda1 * lambda2 * data$X1 * data$X2)
  f = sum(log(A + B + C + D)) - sum(E)
  return(exp(f))
}

# Log-likelihood function of x
log.likelihood.x = function(params, data){
  lambda1 = params[1]
  lambda2 = params[2]
  theta = params[3]
  A = (1 - theta) * (lambda1 * lambda2)
  B = theta * (lambda1 ^ 2) * lambda2 * data$X1
  C = theta * lambda1 * (lambda2 ^ 2) * data$X2
  D = (theta ^ 2) * (lambda1 ^ 2) * (lambda2 ^ 2) * data$X1 * data$X2
  E = (lambda1 * data$X1) + (lambda2 * data$X2) + (theta * lambda1 * lambda2 * data$X1 * data$X2)
  f = sum(log(A + B + C + D)) - sum(E)
  return(-f)
}

Here's the function for generating the data

# Simulating data
rGBVE = function(n, lambda1, lambda2, theta) {
  x1 = rexp(n, lambda1)
  lambda12 = lambda1 * lambda2
  pprod = lambda12 * theta
  C = exp(lambda1 * x1)
  A = (lambda12 - pprod + pprod * lambda1 * x1) / C
  B = (pprod * lambda2 + pprod ^ 2 * x1) / C
  D = lambda2 + pprod * x1
  wExp = A / D
  wGamma = B / D ^ 2
  data.frame(x1, x2 = rgamma(n, (runif(n) > wExp / (wExp + wGamma)) + 1, D))
}

data = rGBVE(n=100, lambda1 = 1.2, lambda2 = 1.4, theta = 0.5)
colnames(data) = c("X1", "X2")

My goal is to find MLE for lambda1, lambda2 and theta using Optim() in r.

Kindly assist me to implement my likelihood and log-likelihood function in r. Thank you.

can we please have a [mcve]? That is, give us a **small** set of data and some parameters that generate `NA` values. — Ben Bolker, May 26 '22 at 15:47
I'd also encourage you to consistently use spaces between your operators. `theta*lambda1*lambda2*data$X1*data$X2` is quite hard to read compared to `theta * lambda1 * lambda2 * data$X1 * data$X2`. On the `E` definition, maybe line breaks after each `+`. If you are using RStudio, highlighting the code and selecting `Code > Reformat Code` will add the spaces for you. — Gregor Thomas, May 26 '22 at 16:11
I have included the function for simulating data and reformated the codes. Hope it's much better now. — Roland Fiagbe, May 26 '22 at 19:26
Your data generating function creates a data frame with variables `x1` and `x2` (lowercase) but your log-likelihood function expects `data$X1` and `data$X2` (uppercase) — lhs, May 27 '22 at 00:35
I renamed the columns to uppercase in my work so that is not really the problem. The main problem I need help with is confirming if my codes for the likelihood and log-likelihood function are correct based on the function I attached above. Here is the function https://i.stack.imgur.com/FQkTl.png — Roland Fiagbe, May 27 '22 at 15:09
Your code as given works OK for me: `likelihood.x(c(1.2,1.4,0.5), data); log.likelihood.x(c(1.2,1.4,0.5), data); optim(par = c(1.2,1.4,0.5), fn = log.likelihood.x, dat = data)` all give somewhat reasonable answers. Can you give a [mcve] where it produces NA values? (Using starting values of `c(1, 2, 1)` warns about producing `NaN` values, but still converges to the same reasonable answer.) — Ben Bolker, May 27 '22 at 15:18
@BenBolker thanks. My results do converge to reasonable answers but warn me about producing NaN values. So I thought there was a problem with my log-likelihood function. I guess the problem is the starting value. Is there any way to get a good starting value for optimization? — Roland Fiagbe, May 27 '22 at 18:02

score 4 · Accepted Answer · answered May 27 '22 at 23:05

Your concern appears to be about the warning message

In log(A+B+C+D): NaNs produced

Such warnings are usually harmless — it just means that the optimization algorithm tried a set of parameters somewhere along the way that violated the condition A+B+C+D ≥ 0. Since these are reasonably complex expressions it would take a little bit of effort to figure out how one might constrain the parameters (or reparameterize the function, e.g. fitting some of the parameters on the log scale) to avoid the warning, but taking a guess that keeping the parameters non-negative will help, we can try using the L-BFGS-B algorithm (which is the only algorithm available in optim() that allows multidimensional bounded optimization).

r1 <- optim(par = c(1,2,1),
      fn = log.likelihood.x,
      dat  = data)

r2 <- optim(par = c(1,2,1),
      fn = log.likelihood.x,
      lower = rep(0,3),
      method = "L-BFGS-B",
      dat  = data)

The second does not generate warnings, and the results are close (if not identical):

all.equal(r1$par, r2$par)
## "Mean relative difference: 0.0001451953"

You might want to use bbmle, which has some additional features for likelihood modeling:

library(bbmle)
fwrap <- function(x) log.likelihood.x(x, dat = data)
parnames(fwrap) <- c("lambda1", "lambda2", "theta")
m1 <- mle2(fwrap, start = c(lambda1 = 1, lambda2 = 2, theta = 1), vecpar = TRUE,
           method = "L-BFGS-B", lower = c(0, 0, -0.5))
pp <- profile(m1)
plot(pp)
confint(pp)
confint(m1, method = "quad")

Coding likelihood and log-likelihood function in r to perform optimization

1 Answers1

Linked