I am trying to compare categorical data from 2 groups.
Yes No
GrpA: [152, 220]
GrpB: [187, 350]
However, I am getting different P value results when using different methods:
count = [152, 220]
nobs = [187, 350]
import statsmodels
import scipy.stats
# USING STATSMODELS PACKAGE:
res = statsmodels.stats.proportion.proportions_chisquare(count, nobs)
print("P value =", res[1])
res = statsmodels.stats.proportion.proportions_ztest(count, nobs)
print("P value =", res[1])
# USING SCIPY.STATS PACKAGE:
res = scipy.stats.chi2_contingency([count, nobs], correction=True)
print("P value =", res[1])
res = scipy.stats.chi2_contingency([count, nobs], correction=False)
print("P value =", res[1])
Output is:
P value using proportions_chisquare = 1.037221289479458e-05
P value using proportions_ztest= 1.0372212894794536e-05
P value using chi2_contingency with correction= 0.0749218380702875
P value using chi2_contingency without correction= 0.06421435896354544
First 2 are identical (and highly significant), but they are different from last 2 (non-signficant).
Why are the results different? Which is the correct method to do this analysis?