My dataset contains the following columns:
Voted? Political Category
1 Right
0 Left
1 Center
1 Right
1 Right
1 Right
I would need to see which category is mostly associated with people who voted. To do this, I would need to calculate the chi-squared. What I would like is to group by Voted? and Political Category in order to have something like this:
(1, Right) : 1500 people
(0, Right) : 202 people
(1, Left): 826 people
(0, Left): 652 people
(1, Center): 431 people
(0, Center): 542 people
In R, I would do:
yes = c(1500, 826, 431)
no = c(212, 652, 542)
TBL = rbind(yes, no); TBL
[,1] [,2] [,3]
yes 1500 826 431
no 212 652 542
and apply
chisq.test(TBL, cor=F)
with:
X-squared = 630.08, df = 2, p-value < 2.2e-16
Even better if I use prop.test, as it would give the proportions of people voting in each political category.
prop 1 prop 2 prop 3
0.8761682 0.5588633 0.4429599
I would like to get the same, or similar, results in Python.