I want to find out whether there is a relationship between how well the students did on a particular test and the level of dropout from education. I have a 2×2 matrix with the variables Level in test which takes the values level 1 and level 2, and the variable dropout which has the values not active and active. (you can say that level 1=pass the test and level 2=not passed).
I can see that I have a problem with the term called "simpson paradox", because I get that every single education in the faculty has a high p value indicating that there is no relationship between level in test and dropout. BUT when I group the data and perform the analysis for the whole faculty, I get a low p value indicating that there is a significant relationship between the variables.??
I have tried to read about the Simpson paradox, but I don't seem to get the information of how to deal with this problem?
I have read one place that one should not perform the test on aggregated data, but that cannot be true?
I really hope that someone can help me!
Kind Regards Maria