1

I try to use an iterative procedure to estimate the marginal likelihood in a Bayesian setting for model selection. In case you are interested in the specifics, see this or this paper for instance. I'm fighting with numerical issues for quite some time now, so I thought I might as well give it a shot here and try my luck. I've tried to make what follows below as short, reproducible and general as possible.

The Setting

The goal is to run an iterative process of the form

enter image description here

to get an estimate for the marginal likelihood y of my model (this iterative process will converge rather fast to a some value for y). All other variables that you see in the formula are scalars or parameter likelihoods that are already computed and hence they are fixed values. The only term changing in each iteration is the value of y. So basically I have the vectors p_l and q_l of length L and a vector p_m and q_m of length M. So far so good.

The Problem

The likelihood values used in the formula above are unfortunately not log-likelihoods (which I compute and store in the vectors), but actual likelihoods. This is the source of the numerical issues I have. You probably know the old problem of exponentiating negative log likelihoods that are large in magnitude. My log likelihoods are so negative that exp(log likelihood) will almost always be 0. There are some similar questions with answers already online, but none of those solutions did the trick for me.

What I tried

I think what might be promising is to exploit the fact that exp(log(x)) = x and expand the fraction, so you can rewrite above formula as

enter image description here

where C is a constant of your choice. For a proof that follows the same idea see the appendix of this paper. If one can find a suitable value of C that makes the terms in the sum managable, the problem is solved. However, in my case p_l, q_l, p_m and q_m are very different in terms of magnitude, so no matter what value of C I try to substract or add, I will end up with underflow or overflow again. Thus, right now I actually don't really know how to proceed from here. I appreciate any comment and tip.

Code and Data

You can find some example data (i.e. the vectors with the log likelihoods) here. The code for the iterative process is

L <- 1000
M <- 5000
y <- numeric(length=100)
eval_downer <- numeric(length=L)
eval_upper <- numeric(length=M)
y[1] <- 0.5

for(t in 2:100){ 

  for(m in 1:M){
    up.m <- q_m[m]
    down.m <- (L * q_m[m]) + (M * p_m[m] / y[t-1])
    eval_downer[m]  <- up.m / down.m 
  }

  for(l in 1:L){
    up.l <- p_l[l]
    down.l <- (L * q_l[l]) + (M * p_l[l] / y[t-1])
    eval_upper[l]  <- up.l / down.l
  }

  upper <- mean(eval_upper)
  downer <- mean(eval_downer)

  y[t]  <-   upper / downer

  print(t)
}

Thank you!

yrx1702
  • 1,619
  • 15
  • 27

1 Answers1

0

It's best to work in logs, I think. I'll work with

log( (M * p_m[m] / y[t-1]) 

etc. You have sums of terms of the form

t = exp( a)/(exp(b) + exp(c)) = 1 / (exp(b-a) + exp(c-a))

The log lt of t is

lt = (a-M) -  log1p( 1 + exp(m-M))

where

m = min(b,c), M = max(b,c)

Note that m-M is not positive; if it's very negative exp(m-M) might underflow to 0 but that's ok as we are adding it to 1.

Then we have sums of the form

s = Sum{ exp(l[i])}

Similarly its log ls is

ls = L + log( Sum{ 1 + exp(l[i]-L)})

where L is the (a) maximum of the l[]. We can now evaluate the exps without fear of overflow.

dmuir
  • 4,211
  • 2
  • 14
  • 12