Being a regular R user, I am learning to do analysis using python, I started with chi-square and did the following:
R
> chisq.test(matrix(c(10,20,30,40),nrow = 2))$p.value # test1
[1] 0.5040359
> chisq.test(matrix(c(1,2,3,4),nrow = 2))$p.value # test2
[1] 1
Warning message:
In chisq.test(matrix(c(1, 2, 3, 4), nrow = 2)) :
Chi-squared approximation may be incorrect
> chisq.test(matrix(c(1,2,3,4),nrow = 2),correct = FALSE)$p.value # test3
[1] 0.7781597
Warning message:
In chisq.test(matrix(c(1, 2, 3, 4), nrow = 2), correct = FALSE) :
Chi-squared approximation may be incorrect
Python
In [31]:
temp = scipy.stats.chi2_contingency(np.array([[10, 20], [30, 40]])) # test1
temp[1] # pvalue
Out[31]:
0.50403586645250464
In [30]:
temp = scipy.stats.chi2_contingency(np.array([[1, 2], [3, 4]])) # test2
temp[1] # pvalue
Out[30]:
0.67260381744151676
For test1
, I am satisfied because tests from python and R show similar result, but test2
is not the case, since R have the parameter correct
, so I changed it from default, and the p-value generated is not the same.
Is there anything wrong in my code? Which one should I "believe"?
update 01
Thanks for the feedback. I am aware that chi square test should not be used for cell with value smaller than 5, and I should use fisher exact test instead, my concern is why R and Python gives p-value with such a huge difference.