3

I am trying to graph the null distribution in a chi square test.In R it is feasible to do monte carlo simulation to get the empirical p value using the code:

chisq.test(d,simulate.p.value=TRUE,B=10000)

But it does not return the distribution graph. Is there any way to make R to return the simulated values for the test?

Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Masih
  • 920
  • 2
  • 19
  • 36

1 Answers1

4

If you look at the function definition for chisq.test (around line 56 of capture.output(chisq.test)) you'll come to the simulation section:

    if (simulate.p.value && all(sr > 0) && all(sc > 0)) {
        setMETH()
        tmp <- .Call(C_chisq_sim, sr, sc, B, E)
        STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE))
        PARAMETER <- NA
        PVAL <- (1 + sum(tmp >= almost.1 * STATISTIC))/(B + 
            1)
    }

This is calling a C function. First generate some dummy data

## Some data
x <- as.table(rbind(c(762, 327, 468), c(484, 239, 477)))
dimnames(x) <- list(gender = c("F", "M"),
                party = c("Democrat","Independent", "Republican"))

Then grab the bit that you need

sr <- rowSums(x)
sc <- colSums(x)
n <- sum(x)
E <- outer(sr, sc, "*")/n
v <- function(r, c, n) c * r * (n - r) * (n - c)/n^3
V <- outer(sr, sc, v, n)
dimnames(E) <- dimnames(x)
B = 2000
tmp <- .Call(stats:::C_chisq_sim, sr, sc, B, E)
STATISTIC <- sum(sort((x - E)^2/E, decreasing = TRUE))
almost.1 <- 1 - 64 * .Machine$double.eps                                                  
PVAL <- (1 + sum(tmp >= almost.1 * STATISTIC))/(B + 1)

The variable tmp contains the output you want. The variable PVAL matches the output of

chisq.test(x, simulate.p.value = T, B=2000)$p.value

Note I've used ::: since the function C_chisq_sim isn't exported from stats.

csgillespie
  • 59,189
  • 14
  • 150
  • 185