1

I am trying to integrate a very simple likelihood function (three Bernoulli trials) in R, but it seems that I have not understood how integrate function works. My code is as follows:

integrate(
    function(theta) prod(
        dbinom(
            x = c(1,0,1), #x is the data: "success, fail, success"
            size = 1, #Just one Bernoulli trial times the length of x.
            prob = theta, #Since, this is a likelihood function, the parameter value is the variable
            log = F #We use a likelihood instead of log-likelihood
        )
    ),
    lower = 0, #The parameter theta cannot be smaller than zero...
    upper = 1 #...nor greater than one.
)

However, I get this error message:

Error in integrate(function(theta) prod(dbinom(x = c(1, 0, 1), size = 1,  : 
  evaluation of function gave a result of wrong length

Now, why is the result of wrong length? The result is always of length 1, since the anonymous function uses prod function, which in turn creates a product of all the vector elements the function dbinom returns (this vector has the length of three, since its first argument x has length of three).

What the result should be then to be of right length?

  • So, the integrate function attempts to input a vector of values to the prob argument, and this is where it fails, since only single values are accepted by dbinom. Now, I understand this. – Aku-Ville Lehtimäki Jul 31 '23 at 09:20
  • That's not quite correct, @Aku-VilleLehtimäki. The `x` in `dbinom` can take vectorized input. Try `dbinom(x = c(1, 2, 3), 3, 0.1)`. What precisely are you trying to calculate. Can you write it in pseudo-math, perhaps? This StackExchange site doesn't have LateX-style markup, unfortunately. – Avraham Jul 31 '23 at 10:17

2 Answers2

3

The question code is wrong in two points:

  • You must optimise, not integrate;
  • a product of many factors in (0, 1) will become close to zero, that's why you need to sum the log-likelihood.
fn <- function(theta, x) sum(dbinom(x, size = 1, prob = theta, log = TRUE))

MLE <- function(x) {
  f <- function(theta, x) fn(theta, x)
  optimise(f, c(0, 1), x = x, maximum = TRUE)$maximum
}

# the question's sample
MLE(c(1, 0, 1))
#> [1] 0.666675

# other samples, increasingly larger
set.seed(2023)
x <- rbinom(10, 1, p = 2/3)
MLE(x)
#> [1] 0.9999339

x2 <- rbinom(100, 1, p = 2/3)
MLE(x2)
#> [1] 0.5600005

x3 <- rbinom(1000, 1, p = 2/3)
MLE(x3)
#> [1] 0.691006

x4 <- rbinom(10000, 1, p = 2/3)
MLE(x4)
#> [1] 0.6746081

Created on 2023-07-31 with reprex v2.0.2


Edit

I had misunderstood the question, I thought it was asking for the MLE given the sample c(1, 0, 1).
Here is a corrected sum/log version, which is nothing more than the solution in Avraham's answer. The result and absolute error are the same.

fn <- function(theta) sum(dbinom(c(1, 0, 1), size = 1, prob = theta, log = TRUE)) |> exp()
integrate(Vectorize(fn), 0, 1)
#> 0.08333333 with absolute error < 9.3e-16

Created on 2023-08-01 with reprex v2.0.2

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • I do not understand what a MLE has to do with my question. The MLE is indeed the maximum of the likelihood function, but my question was about integrating the likelihood function. (The maximum could be obtained analytically etc. in this case, but that's another story.) – Aku-Ville Lehtimäki Aug 01 '23 at 09:04
  • 1
    @Aku-VilleLehtimäki Thanks, I had misunderstood the question. Edited. – Rui Barradas Aug 01 '23 at 11:06
2

The issue here is not with dbinom but with prod. dbinom is a vectorized function but prod, according to its definition, returns a "a numeric (of type "double") or complex vector of length one."

As noted in the comments, the simplest approach is to simply wrap your function in Vectorize inside the integrate call. For example:

fn <- function(theta) prod(dbinom(c(1, 0, 1), size = 1, prob = theta, log = FALSE))
integrate(Vectorize(fn), 0, 1)
0.08333333 with absolute error < 9.3e-16
Avraham
  • 1,655
  • 19
  • 32
  • 1
    This is wrong, you must optimise, not integrate the likelihood, see my answer. – Rui Barradas Jul 31 '23 at 11:11
  • Hi, @RuiBarradas. You may be correct that the OP's initial question is based on a misunderstanding, but it remains correct to say that based on the original question, the issue is that `prod` is scalar and `integrate` requires vectorization. Which makes my answer the correct answer to the wrong question (a Type III error as it were :) ). – Avraham Jul 31 '23 at 12:38
  • 1
    As it turns out this is the right answer, see the [OP's comment](https://stackoverflow.com/questions/76801983/calculating-an-integral-of-a-bernoulli-likelihood-function-in-r/76803032?noredirect=1#comment135411373_76803032) to my answer. Upvote. – Rui Barradas Aug 01 '23 at 11:08