chi-square results not equal if expected proportions used instead of counts in R

Question

testing goodness of fit between two samples from the same fact in different months. I want to know if results in 4 categories from September are similar/different from October.

The issue is my test is giving me different conclusions when using contingency table vs. expected proportions taking previous month as expected proportions.

Sample data:

data <-data.frame(september=c(10741, 1575, 174, 2),
          october= c(11987, 1705, 211, 2), 
          row.names = c("A", "B", "C", "D"))
> data
  september october
A     10741   11987
B      1575    1705
C       174     211
D         2       2

testing the usual way using contingency table:

> chisq.test(data)

Pearson's Chi-squared test

data:  data
X-squared = 1.3846, df = 3, p-value = 0.7092

calculating proportions from September and setting it as expected probabilities:

    p <- data$september %>% prop.table()

    [1] 0.8598302914 0.1260806916 0.0139289145 0.0001601025

 chisq.test(x=data$october, p = p)

    Chi-squared test for given probabilities

data:  data$october
X-squared =  2.9748, df = 3, p-value = 0.3955

why such difference in test? which one is wrong? I assume the two strategies lead to same result but seems to be a mistake.

It's a methodological problem, the expected proportions are different in each case. In first example we will use the mean of September and October. On the second one you set the probabilities of September as the expected ones. So this is the difference. — Brutalroot, Nov 11 '20 at 17:07
So @Brutalroot , if I get your point, for comparing September vs. October the first method is the most accurate? — Forge, Nov 11 '20 at 19:52
It depends. If you do not know about the expected frequency, yes. If you want to check the interdependence between September and October, you should use the first method too. However, if you know the expected frequency and want to test if October are independent from it you should use method 2. — Brutalroot, Nov 12 '20 at 11:31

chi-square results not equal if expected proportions used instead of counts in R

0 Answers0