Is this R code of Rao score test for the Bernoulli data model correct?

Question

I am a complete statistical noob and new to R, hence the question. I've tried to find an implementation of the Rao score for the particular case when one's data is binary and each observation has bernoulli distribution. I stumbled upon anova in the R language but failed to understand how to use that. Therefore, I tried implementing Rao score for this particular case myself:

rao.score.bern <- function(data, p0) {
  # assume `data` is a list of 0s and 1s
  y <- sum(data)
  n <- length(data)
  phat <- y / n

  z <- (phat - p0) / sqrt(p0 * (1 - p0) / n)
  p.value <- 2 * (1 - pnorm(abs(z)))
}

I am pretty sure that there is a bug in my code because it produces only two distinct p-values in the following scenario:

p0 <- 1 / 4
p <- seq(from=0.01, to=0.5, by=0.01)
n <- seq(from=5, to=70, by=1)
g <- expand.grid(n, p)

data <- apply(g, 1, function(x) rbinom(x[1], 1, x[2]))
p.values <- sapply(data, function(x) rao.score.bern(x[[1]], p0))

Could someone please show me where the problem is? Could you perhaps point me to a built-in solution in R?

score 5 · Accepted Answer · edited Jun 20 '20 at 09:12

First test, then debug.

Test

Does rao.score.bern work at all?

rao.score.bern(c(0,0,0,1,1,1), 1/6))

This returns...nothing! Fix it by replacing the ultimate line by

2 * (1 - pnorm(abs(z)))

This eliminates the unnecessary assignment.

rao.score.bern(c(0,0,0,1,1,1), 1/6))

[1] 0.02845974

OK, now we're getting somewhere.

Debug

Unfortunately, the code still doesn't work. Let's debug by yanking the call to rao.score.bern and replacing it by something that shows us the input. Don't apply it to the large input you created! Use a small piece of it:

sapply(data[1:5], function(x) x[[1]])

[1] 0 0 0 0 0

That's not what you expected, is it? It's returning just one zero for each element of data. What about this?

sapply(data[1:5], function(x) x)

[[1]]
[1] 0 0 0 0 0
[[2]]
[1] 0 0 0 0 0 0 
...
[[5]]
[1] 0 0 0 0 0 0 0 0 0

Much better! The variable x in the call to sapply refers to the entire vector, which is what you want to pass to your routine. Whence

p.values <- sapply(data, function(x) rao.score.bern(x, p0)); hist(p.values)

This is a great answer, explaining a strategy for identifying the problems, rather than just pointing out what the present problems are. — Glen_b, Feb 25 '15 at 21:28

Is this R code of Rao score test for the Bernoulli data model correct?

1 Answers1

Test

Debug