Why I am getting different P values if using different packages

Question

I am trying to compare categorical data from 2 groups.

       Yes  No
GrpA: [152, 220]
GrpB: [187, 350]

However, I am getting different P value results when using different methods:

count = [152, 220]
nobs = [187, 350]

import statsmodels
import scipy.stats

# USING STATSMODELS PACKAGE: 
res = statsmodels.stats.proportion.proportions_chisquare(count, nobs)
print("P value =", res[1])
res = statsmodels.stats.proportion.proportions_ztest(count, nobs)
print("P value =", res[1])

# USING SCIPY.STATS PACKAGE:
res = scipy.stats.chi2_contingency([count, nobs], correction=True)
print("P value =", res[1])
res = scipy.stats.chi2_contingency([count, nobs], correction=False)
print("P value =", res[1])

Output is:

P value using proportions_chisquare = 1.037221289479458e-05
P value using proportions_ztest= 1.0372212894794536e-05

P value using chi2_contingency with correction= 0.0749218380702875
P value using chi2_contingency without correction= 0.06421435896354544

First 2 are identical (and highly significant), but they are different from last 2 (non-signficant).

Why are the results different? Which is the correct method to do this analysis?

In the `statsmodels` functions, `nobs` should be the *the number of trials or observations*, but it looks like you have assigned the `GrpB` values to `nobs`. — Warren Weckesser, Jul 01 '20 at 13:30
Yes, it works with total number of observations (rather than grpB only). Thanks. — rnso, Jul 01 '20 at 13:35

Why I am getting different P values if using different packages

0 Answers0