I have a data frame which I am wanting to calculate a chi squared and p-value for. However, when I print out the expected values they are not what I expect. The null hypothesis I was expecting the code to test is that there is no dependence of Q7 on 'ConcernImprovement', so I expected the 'expected frequencies' for decrease, increase and no change to be the same for each Q7 entry
This is my observed data frame which is called LikelihoodConcern
:
ConcernImprovement Decrease Increase No change
Q7
Likely 2.0 18.0 21.0
Not likely at all 0.0 2.0 1.0
Not very likely 3.0 11.0 5.0
Somewhat likely 4.0 24.0 14.0
Very likely 1.0 16.0 8.0
I tried this code:
from scipy.stats import chi2_contingency
chi2, p, dof, expected = chi2_contingency(LikelihoodConcern, correction=False)
expected
It returns this for the expected frequencies:
array([[ 3.15384615, 22.39230769, 15.45384615],
[ 0.23076923, 1.63846154, 1.13076923],
[ 1.46153846, 10.37692308, 7.16153846],
[ 3.23076923, 22.93846154, 15.83076923],
[ 1.92307692, 13.65384615, 9.42307692]])
I expected it to return:
array([[ 13.67777777, 13.67777777, 13.67777777],
[ 1.00000000, 1.00000000, 1.00000000],
[ 6.33333333, 6.33333333, 6.33333333],
[ 14.00000000, 14.00000000, 14.00000000],
[ 8.33333333, 8.33333333, 8.33333333]])
I have looked at the source code for the expected_freq
function as the documentation doesn't have much detail - but I still don't understand why I am not seeing what I expect