Performing Chi square test for independence in sas- simpson paradox

Question

I want to find out whether there is a relationship between how well the students did on a particular test and the level of dropout from education. I have a 2×2 matrix with the variables Level in test which takes the values level 1 and level 2, and the variable dropout which has the values not active and active. (you can say that level 1=pass the test and level 2=not passed).

I can see that I have a problem with the term called "simpson paradox", because I get that every single education in the faculty has a high p value indicating that there is no relationship between level in test and dropout. BUT when I group the data and perform the analysis for the whole faculty, I get a low p value indicating that there is a significant relationship between the variables.?? I have tried to read about the Simpson paradox, but I don't seem to get the information of how to deal with this problem? I have read one place that one should not perform the test on aggregated data, but that cannot be true? enter image description here

I really hope that someone can help me!

Kind Regards Maria

score 0 · Answer 1 · answered Jun 25 '14 at 14:05

0

For the cross-tabs labeled education 2 and education 5 you have cell values less than 5 which violates the assumptions for running a chi-square. There are arguments to be made about how chi-square is robust enough of a test to withstand these limitations, but I would still reconsider your grouping methodology.

answered Jun 25 '14 at 14:05

jdolan1

1

In the cross-tabs wih expected values less than 5, I use the p value from the Fishers exact test. I guess that is ok? Why would you reconsider the grouping methodology? I don' understand why I cannot group the data? – user1626092 Jun 26 '14 at 07:08

score 0 · Answer 2 · answered Feb 07 '15 at 00:48

Since the total number of cases in 'Faculty' is higher, the data is enough to refute the independence hypothesis, hence low p-values. When the number of cases is small (your education 1 to education 5 tables), there is not enough data to show significance. A higher p-value here just says that the differences could be by chance.

This is not an example of Simpson's paradox.

Performing Chi square test for independence in sas- simpson paradox

2 Answers2