I have a dataset that looks like this: Dataset
> dput(THSWP1_23)
structure(list(`Town District` = c(1, 2, 3, 4, 5, 6, 7, 8, 9),
`health score 1` = c(50, 236, 215, 277, 261, 333, 414, 385,
358), `Health score 2 and 3` = c(51, 238, 218, 281, 266,
339, 421, 393, 367)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L))
I want to run a chi-square test to see if there is a significant difference between Health score 1 and Health score 2+3 for every Town district.
i have tried these codes:
> chisq.test(THSWP1_23$`Town District`,THSWP1_23$`health score 1`)
> chisq.test(THSWP1_23$`Town District`,THSWP1_23$`Health score 2 and 3`)
> chisq.test(THSWP1_23$`health score 1`,THSWP1_23$`Health score 2 and 3`)
and all plots give the same output:
Pearson's Chi-squared test
data: THSWP1_23$`health score 1` and THSWP1_23$`Health score 2 and 3`
X-squared = 72, df = 64, p-value = 0.2303
Warning message:
In chisq.test(THSWP1_23$`health score 1`, THSWP1_23$`Health score 2 and 3`) :
Chi-squared approximation may be incorrect
I can't seem to figure out why the same X, df and p-values are given when the variables were combined in 3 different manners.
And i keep getting the approximation error. Is this due to the data itself or a faulty code?