2

I am trying to rate NFL teams by minimizing the sum of squared errors subject to a constraint. My data looks like:

dat = {"Home_Team": ["KC Chiefs", "LA Chargers", "Baltimore Ravens"],
       "Away_Team": ["Houston Texans", "Miami Dolphins", "KC Chiefs"],
       "Home_Score": [34, 20, 20],
       "Away_Score": [20, 23, 34],
       "Margin": [14, -3, -34]
      }
df = pd.DataFrame(dat)
df

    Home_Team        Away_Team      Home_Score  Away_Score  Margin
0   KC Chiefs        Houston Texans 34          20          14
1   LA Chargers      Miami Dolphins 20          23          -3
2   Baltimore Ravens KC Chiefs      20          34          -34 

The Margin is Margin = Home_Score - Away_Score. My goal is to come up with a numerical rating for each team such that the average of all the teams' ratings is zero. Hence, if the Chiefs have a rating of 3.0, then they are 3 points better than the average team.

Given these ratings, we generate predictions in this way: the home team's predicted margin of victory is Home_Edge + Home_Rating - Away_Rating, where Home_Edge is the home field advantage (a constant for all home teams), Home_Rating is the home team's rating, and Away_Rating the away team's rating.

The error in a prediction is prediction - Margin, and I want to minimize the sum of squares of these errors. I am trying to do this using scipy.optimize in the following way:

# Our objective function, where x is our array of parameters, 
# x[0] is the home edge, x[1] the home rating, and x[2] the away rating
# Y is the true, observed margin
def obj_fun(x, Y):
    y = x[0] + x[1] - x[2]
    return np.sum((y - Y)**2)

# Define the constraint function. We have that the ratings average to 0
def con(x):
    return np.mean(x[1])

# Constraint dictionary
cons = {'type': 'eq', 'fun': con}

# Minimize sum of squared errors
from scipy import optimize

# Initial guesses (numbers I randomly thought of in my head)
home_edge = 0.892
home_ratings = np.array([1.46, 9.67, -0.82])
away_ratings = np.array([-3.10, -6.57, 1.46])
x_init = [np.repeat(home_edge, 3), home_ratings, away_ratings]

# Minimize
results = optimize.minimize(fun = obj_fun, args = (df["Margin"]), 
x0 = x_init, constraints = cons)

print(results.x)
[-2.9413615   0.          4.72534244  1.46        9.67       -0.82
 -3.1        -6.57        1.46      ]

I don't know what's going wrong here, but I want my output to have 6 solutions, not 9. One for the home edge, and the remaining five for each team. What's going wrong? Thank you!

Jake
  • 275
  • 4
  • 11
  • The text, the code and the expectations do not fit together imho. Dimensions are completely off in regards to the task i would say. Your objective just uses three variables. Even if you would want to optimize over 33 vars, your could would never do that. *There is a very basic ingredient missing: generating the pairwise prediction-errors.* (at least assuming you didn't already duplicate/expanded the pd-df; hard to read from the question) `x_init` which must be `1-d` (or else scipy produces gargabe with high probability) is probably problematic too, but i can't infer that without executing it. – sascha Jan 08 '21 at 18:04
  • I see. So I need to be generating pairwise prediction errors in the objective function, or where? How would you go about solving some of these problems, and would it be more helpful if I created an example that you could execute? – Jake Jan 08 '21 at 18:08
  • Depends on what you actually want to optimize. I guess you don't want to put some value to `T_1` without evaluating the effects on all other teams `T_x` in regards to observations. Most importantly, you should improve the description. I don't get it and then it's hard to reason about it. I already got the feeling you are showing columns which aren't used at all (rating vs.scores?). So yeah: there might be theory as well as implementation problems (the latter here on SO: nearly always boiling down to not assuming 1-d vector of variables which is a must!) . – sascha Jan 08 '21 at 18:11
  • The procedure I'm following is described on pages 285-288 of this pdf (not the literal page numbers in the book!): http://stavochka.com/files/Mathletics.pdf The author uses Excel, I am essentially trying to do the same in Python. – Jake Jan 08 '21 at 18:29
  • @sascha I have created a reproducible example and edited the text. I hope things are clearer for you and any other reader now. – Jake Jan 08 '21 at 20:26

0 Answers0