2

I'm calculating chi-squared goodness of fit test. There are four vegetation types (A–D), each occupies a given % of the total study area, and in each vegetation a total number of specimens was calculated. The question is whether a distribution of a this plant species is proportional to vegetation types areas or not. I ran the test in R and with an online calculator, but the results are very different and only the online calculator returns the correct values (I know the answer).

A <- c(45, 4, 10, 59) #number of specimens in each vegetation, total 118 observations
B <- c(24, 17, 5, 54) #area of each vegetation = % of the total study area
C <- c(28.32, 20.06, 5.9, 63.72) #expected values (area % * 118)

chisq.test(A, C)

The output

    Pearson's Chi-squared test

data:  A and C
X-squared = 12, df = 9, p-value = 0.2133

Next, I rerun the test with an online calculator (https://www.statology.org/chi-square-goodness-of-fit-test-calculator/) using my observed (A) and expected (C) data and the result is:

X2 Test Statistic: 25.880627
p-value: 0.000010

This is also the correct answer. The question is: what am I doing wrong to have these two tests run so differently?

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • 1
    By the problem description you want `chisq.test(A, p = C/118)`. And the result matches your known result. See the help page `?chisq.test`. – Rui Barradas Apr 10 '23 at 19:42
  • Had a similar problem once. I think the key was a function argument to `chisq.test()` called 'exact' or something like that. – uke Apr 10 '23 at 19:59

1 Answers1

1

The input chisq.test() is not what people expect. The best way is input the vector to test, x the vector of expected probabilities, p and the rescale parameter=TRUE.
Examine the "expected" results to confirm the calculation makes sense.

A <- c(45, 4, 10, 59) #number of specimens in each vegetation, total 118 observations
B <- c(24, 17, 5, 54) #area of each vegetation = % of the total study area
C <- c(28.32, 20.06, 5.9, 63.72) #expected values (area % * 118)

chi <- chisq.test(A, p=C, rescale.p = TRUE)
print(chi)
# Chi-squared test for given probabilities
# 
# data:  A
# X-squared = 25.881, df = 3, p-value = 1.01e-05
chi$expected
#[1] 28.32 20.06  5.90 63.72

Using chisq.test(A, C) generates a square matrix which is not what you want.

chi_wrong <- chisq.test(A, C)
chi_wrong$expected
#   C
# A   5.9 20.06 28.32 63.72
# 4  0.25  0.25  0.25  0.25
# 10 0.25  0.25  0.25  0.25
# 45 0.25  0.25  0.25  0.25
# 59 0.25  0.25  0.25  0.25 
Dave2e
  • 22,192
  • 18
  • 42
  • 50