You have a contingency table. To perform the χ2 test on this data, you can use scipy.stats.chi2_contingency
:
In [31]: from scipy.stats import chi2_contingency
In [32]: obs = np.array([[25, 75], [100, 100]])
In [33]: obs
Out[33]:
array([[ 25, 75],
[100, 100]])
In [34]: chi2, p, dof, expected = chi2_contingency(obs)
In [35]: p
Out[35]: 5.9148695289823149e-05
Your contingency table is 2x2, so you can use Fisher's exact test. This is implemented in scipy as scipy.stats.fisher_exact
:
In [148]: from scipy.stats import fisher_exact
In [149]: oddsr, pval = fisher_exact(obs)
In [150]: pval
Out[150]: 3.7175015403965242e-05
scipy doesn't have much more for contingency tables. It looks like the next release of statsmodels
will have more tools for the analysis of contingency tables, but that doesn't help right now.
It isn't hard to write some code to compute the difference in proportion and its 95% confidence interval. Here's one way:
# Include this if you are using Python 2.7. Or tweak the code in the
# function to ensure that division uses floating point.
from __future__ import division
def diffprop(obs):
"""
`obs` must be a 2x2 numpy array.
Returns:
delta
The difference in proportions
ci
The Wald 95% confidence interval for delta
corrected_ci
Yates continuity correction for the 95% confidence interval of delta.
"""
n1, n2 = obs.sum(axis=1)
prop1 = obs[0,0] / n1
prop2 = obs[1,0] / n2
delta = prop1 - prop2
# Wald 95% confidence interval for delta
se = np.sqrt(prop1*(1 - prop1)/n1 + prop2*(1 - prop2)/n2)
ci = (delta - 1.96*se, delta + 1.96*se)
# Yates continuity correction for confidence interval of delta
correction = 0.5*(1/n1 + 1/n2)
corrected_ci = (ci[0] - correction, ci[1] + correction)
return delta, ci, corrected_ci
For example,
In [22]: obs
Out[22]:
array([[ 25, 75],
[100, 100]])
In [23]: diffprop(obs)
Out[23]:
(-0.25,
(-0.35956733089748971, -0.14043266910251032),
(-0.36706733089748972, -0.13293266910251031))
The first value returned is the difference in proportions delta
. The next two pairs are the Wald 95% confidence interval for delta
, and the Wald 95% confidence interval with Yates continuity correction.
If you don't like those negative values, you can reverse the rows first:
In [24]: diffprop(obs[::-1])
Out[24]:
(0.25,
(0.14043266910251032, 0.35956733089748971),
(0.13293266910251031, 0.36706733089748972))
For comparison, here is a similar calculation in R:
> obs
[,1] [,2]
[1,] 25 75
[2,] 100 100
> prop.test(obs, correct=FALSE)
2-sample test for equality of proportions without continuity
correction
data: obs
X-squared = 17.1429, df = 1, p-value = 3.467e-05
alternative hypothesis: two.sided
95 percent confidence interval:
-0.3595653 -0.1404347
sample estimates:
prop 1 prop 2
0.25 0.50
> prop.test(obs, correct=TRUE)
2-sample test for equality of proportions with continuity correction
data: obs
X-squared = 16.1297, df = 1, p-value = 5.915e-05
alternative hypothesis: two.sided
95 percent confidence interval:
-0.3670653 -0.1329347
sample estimates:
prop 1 prop 2
0.25 0.50