2

I'm working on setting up a maximum likelihood estimator to estimate the parameters for a dirichlet-multinomial distribution. Based on what I've seen elsewhere, it looks like the function ddirichlet.multinom() is working as expected, but when I pass to the log-likelihood function, mle seems to want to push it to the lower bounds! I figure I've mixed something up in either the function ll() or in the params for mle(), but can't seem to sort out what the issue is. Any guidance here would be much appreciated!

library(tidyverse)

# setup a set of example data using a known dirichlet distribution
ex_data <- 
  gtools::rdirichlet(500, c(37, 5, 13, 120)) %>%
  as_tibble(.name_repair = "universal") %>%
  rename_with(~str_replace(.x, "...", "x")) %>%
  add_column(n = round(rnorm(500, 750, 20))) %>%
  mutate(across(starts_with("x"), ~round(.x * n))) %>%
  select(-n)
#> New names:
#> • `` -> `...1`
#> • `` -> `...2`
#> • `` -> `...3`
#> • `` -> `...4`

ex_data
#> # A tibble: 500 × 4
#>       x1    x2    x3    x4
#>    <dbl> <dbl> <dbl> <dbl>
#>  1   153    23    81   498
#>  2   156    18    65   500
#>  3   208     6    40   479
#>  4   142    25    73   524
#>  5   181     9    54   497
#>  6   139    21    40   512
#>  7   184    24    49   495
#>  8   137    10    39   544
#>  9   153     9    44   519
#> 10   186    21    42   464
#> # … with 490 more rows

# density function for dirichlet multinomial distribution;
# based on checks w/other implementations, this seems to be working as expected
ddirichlet.multinom <- function(x, alpha, log = FALSE) {
  
  a0 <- sum(alpha)
  n <- sum(x)
  const <- lgamma(a0) + lgamma(n + 1) - lgamma(n + a0)
  
  pr <- vector(length = length(x))
  
  for (i in 1:length(x)) {
    
    pr[i] <- lgamma(x[i] + alpha[i]) - lgamma(alpha[i]) - lgamma(x[i] + 1)
    
  }
  
  prob <- const * prod(pr)
  
  if (log) prob else exp(prob)
  
}


# negative log-likelihood function
# seems to get stuck at the lower bound!
ll <- function(a1, a2, a3, a4) {
  
  x1 <- ex_data$x1
  x2 <- ex_data$x2
  x3 <- ex_data$x3
  x4 <- ex_data$x4
  
  log_likelihood <- vector("double", length = length(x1))
  
  for (i in 1:length(x1)) {
    
    log_likelihood[i] <- 
      ddirichlet.multinom(
        c(x1[i], x2[i], x3[i], x4[i]),
        c(a1, a2, a3, a4),
        log = TRUE
      )
    
  }
  
  -sum(log_likelihood)
  
}

# ll max likelihood estimation 
# depending on the example, it either gets stuck at the lower limit or it
# varies wildly with the starting params
stats4::mle(
  ll,
  method = "L-BFGS-B",
  start = c(37, 5, 13, 120),
  lower = c(1.0001, 1.0001, 1.0001, 1.0001)
)
#> 
#> Call:
#> stats4::mle(minuslogl = ll, start = c(37, 5, 13, 120), method = "L-BFGS-B", 
#>     lower = c(1.0001, 1.0001, 1.0001, 1.0001))
#> 
#> Coefficients:
#>     a1     a2     a3     a4 
#> 1.0001 1.0001 1.0001 1.0001

Created on 2022-07-07 by the reprex package (v2.0.1)

EmilHvitfeldt
  • 2,555
  • 1
  • 9
  • 12
Mark Rieke
  • 306
  • 3
  • 13
  • first it does not seem that your ddirichlet.multinomial works. It gives zero. looking at wikipedia, the function there does not give 0 – Onyambu Jul 08 '22 at 04:15
  • 1
    havin `n <- rowSums(ex_data)` you could do `a<-log(n)+lbeta(n, sum(par)) - rowSums(log(ex_data) + t(lbeta(t(ex_data), par)))` to obtain the `ddir_mult` values. Replace NA/inf with 0. Note that this is different from your loop function – Onyambu Jul 08 '22 at 04:39
  • Oh wow, thank you!! I had been looking in the wrong area all of yesterday. I wonder where I went wrong with the initial setup for `dirichlet.multiom()` --- I though I had implemented the gamma version of the function (& it looked like it matched [this python implementation](https://datascience.oneoffcoder.com/dirichlet-multinomial-distribution.html), but the beta version here works well – Mark Rieke Jul 08 '22 at 13:18
  • Do you still need a solution or the comment is enough? – Onyambu Jul 08 '22 at 13:21
  • It looks like that addresses the lower limit push - now getting `NaN` for rows with a count of 0 (I believe this is coming from the `log(x)` call). From the [wiki](https://en.wikipedia.org/wiki/Dirichlet-multinomial_distribution), the beta form "emphasizes the fact that 0 count categories can be ignored in the calculation." I've dug through [some of the referenced docs](https://cseweb.ucsd.edu//~elkan/edcm.pdf) and they say to ignore, but not explicitly how in terms of code. I'm wondering if `-Inf` can just be replaced with `0` in any vectors where it appears? – Mark Rieke Jul 08 '22 at 14:14

0 Answers0