-1

I am doing a project on Premier League Data. I figured I would start with a simple regression (regress finish wins), regressing league finish on wins. The coefficient given is -.95. I think this is off so I regress finish on losses. The coefficient given is +.95. Obviously this is not accurate, more wins doesn't make you finish lower in the league table. My data for finish is how you would expect, a value of 1 for the champion and a value of 20 for the worst team. My data for wins is also logical, the more wins you have, the higher your value will be. The better teams may have 20 wins and the worse teams 8. These are the values that they are given.

I think Stata has my intentions reversed somehow. Does it think that a higher value for Wins is bad? I assume it thinks I am ranking them by total games won, not the actual number of games won. How do I fix this?

Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • If the final league position is 1 = top, 2 = second and so on, then a lower value is better. So I would expect wins to be negatively correlated with league position. – Mark Pattison Feb 21 '17 at 21:46
  • It is my understanding of a regression output is that the coefficient next to the independent variable, in this case wins, is the unit increase or decrease in the dependent variable, in this case "finish" for each 1 unit increase in the independent variable, wins. So I would not expect it to be negatively correlated http://stats.idre.ucla.edu/stata/output/regression-analysis-2/ – harrison Feb 21 '17 at 22:02
  • More wins will leave a team higher up the table, which we would normally call a HIGHER league position, but if you're labelling the league positions as 1 = top etc, it will actually mean a LOWER value. So the correlation is reversed. It's only down to how we label the league positions. – Mark Pattison Feb 21 '17 at 22:29
  • 1
    Think about it this way: which is a higher league position, 1st or 20th? Now, which is a higher number, 1 or 20? – Mark Pattison Feb 21 '17 at 22:32
  • Is there a way I can tell stata this or do I simply have to create a new variable for finish ordered the other way? – harrison Feb 22 '17 at 01:38
  • There is no programming question here. It's purely that there is statistical confusion about regression with a response variable for which low values mean doing well. If you reverse a scale, the sign of the regression coefficient is flipped. There is no Stata problem, obvious or otherwise. Off-topic, but a better home would be Cross-Validated. – Nick Cox Feb 22 '17 at 10:47

1 Answers1

1

The coefficient is coming out negative because of how the league finishing positions are labelled.

Because the best position, i.e. first place, is counted as 1, with lower positions given increasing values (2, 3...), a higher/better league position is actually associated with a lower value.

As a result, a team with a HIGHER number of wins would be expected to have a LOWER value of their league position.

Hence, the correlation of the number of wins and league position is expected to be negative.

To deal with this, you could either:

  • Create a new variable for finish position which is ordered such that a better league position corresponds to a higher value. The simplest way to do this would be something like X=21-F, if there are 20 teams and the league position is F.
  • Accept that the correlation is negative but make sure to correctly interpret it later.
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
Mark Pattison
  • 2,964
  • 1
  • 22
  • 42