Chi square approximation may be incorrect

Question

I have a dataset that looks like this: Dataset

> dput(THSWP1_23)
structure(list(`Town District` = c(1, 2, 3, 4, 5, 6, 7, 8, 9), 
`health score 1` = c(50, 236, 215, 277, 261, 333, 414, 385, 
358), `Health score 2 and 3` = c(51, 238, 218, 281, 266, 
339, 421, 393, 367)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L))

I want to run a chi-square test to see if there is a significant difference between Health score 1 and Health score 2+3 for every Town district.

i have tried these codes:

> chisq.test(THSWP1_23$`Town District`,THSWP1_23$`health score 1`) 
> chisq.test(THSWP1_23$`Town District`,THSWP1_23$`Health score 2 and 3`)
> chisq.test(THSWP1_23$`health score 1`,THSWP1_23$`Health score 2 and 3`)

and all plots give the same output:

Pearson's Chi-squared test 

data:  THSWP1_23$`health score 1` and THSWP1_23$`Health score 2 and 3` 
X-squared = 72, df = 64, p-value = 0.2303 
 
Warning message: 
In chisq.test(THSWP1_23$`health score 1`, THSWP1_23$`Health score 2 and 3`) : 
  Chi-squared approximation may be incorrect

I can't seem to figure out why the same X, df and p-values are given when the variables were combined in 3 different manners.

And i keep getting the approximation error. Is this due to the data itself or a faulty code?

What exactly is the hypothesis you are trying to test here? Are you trying to get one p-value per Town District? Or are you trying to test if any Town District is different than any other? It might be better to first ask for statistical help at [stats.se] to make sure you are choosing the right test for your hypothesis. Normally you want to pass in a matrix to `chisq.test` which works will when you have two variables measured, but is harder when you have three. — MrFlick, Mar 10 '23 at 15:21

score 3 · Accepted Answer · answered Mar 10 '23 at 15:29

The chisq.test function constructs a contingency table from your two arguments. But your dataframe (in code below I call it df) is already a table, so constructing a table from it is wrong. The table you are testing (your first call) is the one below:

with(df, table(`Town District`, `health score 1`))
             health score 1
Town District 50 215 236 261 277 333 358 385 414
            1  1   0   0   0   0   0   0   0   0
            2  0   0   1   0   0   0   0   0   0
            3  0   1   0   0   0   0   0   0   0
            4  0   0   0   0   1   0   0   0   0
            5  0   0   0   1   0   0   0   0   0
            6  0   0   0   0   0   1   0   0   0
            7  0   0   0   0   0   0   0   0   1
            8  0   0   0   0   0   0   0   1   0
            9  0   0   0   0   0   0   1   0   0

which is not what you want. Just do

chisq.test(df[,2:3])

    Pearson's Chi-squared test

data:  df[, 2:3]
X-squared = 0.024654, df = 8, p-value = 1

which I think is what you want. The conclusion is quite obvious from

round(df[,2] / df[,3],2)
[1] 0.98 0.99 0.99 0.99 0.98 0.98 0.98 0.98 0.98

Chi square approximation may be incorrect

1 Answers1