How do you account for a correlation in a two variable distribution?

Question

Data here, with trendline - https://i.stack.imgur.com/iljp6.jpg

Hey everyone,

I'm tinkering around with admissions stats and I wanted to construct a model that can be used off only one variable. IE, what are your chances of getting admitted with a GPA of 3.5, or a GPA of 3.7, etc. The issue is that as the GMAT increases (x axis), the GPA increases as well. So the two variables are not independent of each other.

What I was thinking of doing was to simply multiply each data point by a factor, scaling for the difference. So at 680, I wouldn't multiply by anything, but at 730, I would reduce the GPA by 0.08.. etc. Is this the correct process, or is there a more formal approach to properly run through this data?

I don't understand. Can you elaborate on why these two variables are not independent and in what way? — user2974951, Sep 03 '19 at 06:44
Thanks for responding. They aren't independent (as far as I know) because a higher GMAT score is also correlated to a higher GPA. If you score higher on the GMAT, you are also more likely to have a higher GPA. This hasn't been the case in a few other datasets that I've looked at. I'm trying to split and bucketize each of these variables independently, but I don't believe I am able to do so because the candidate will also have higher metrics on the other variable, which also increases the chances of admission. Do you know of any way to account for this? — DavidDash, Sep 05 '19 at 04:13
There is an important distinction to make here, are these variables merely correlated, that is higher GPA also means higher GMAT on average (or vice versa), or are these actually dependent? — user2974951, Sep 05 '19 at 07:49
Also, if you only want to use one of these, why not pick the one which is better at prediction and use that as is? — user2974951, Sep 05 '19 at 07:50

How do you account for a correlation in a two variable distribution?

0 Answers0