I am trying to rate NFL teams by minimizing the sum of squared errors subject to a constraint. My data looks like:
dat = {"Home_Team": ["KC Chiefs", "LA Chargers", "Baltimore Ravens"],
"Away_Team": ["Houston Texans", "Miami Dolphins", "KC Chiefs"],
"Home_Score": [34, 20, 20],
"Away_Score": [20, 23, 34],
"Margin": [14, -3, -34]
}
df = pd.DataFrame(dat)
df
Home_Team Away_Team Home_Score Away_Score Margin
0 KC Chiefs Houston Texans 34 20 14
1 LA Chargers Miami Dolphins 20 23 -3
2 Baltimore Ravens KC Chiefs 20 34 -34
The Margin is Margin = Home_Score - Away_Score. My goal is to come up with a numerical rating for each team such that the average of all the teams' ratings is zero. Hence, if the Chiefs have a rating of 3.0, then they are 3 points better than the average team.
Given these ratings, we generate predictions in this way: the home team's predicted margin of victory is Home_Edge + Home_Rating - Away_Rating, where Home_Edge is the home field advantage (a constant for all home teams), Home_Rating is the home team's rating, and Away_Rating the away team's rating.
The error in a prediction is prediction - Margin, and I want to minimize the sum of squares of these errors. I am trying to do this using scipy.optimize
in the following way:
# Our objective function, where x is our array of parameters,
# x[0] is the home edge, x[1] the home rating, and x[2] the away rating
# Y is the true, observed margin
def obj_fun(x, Y):
y = x[0] + x[1] - x[2]
return np.sum((y - Y)**2)
# Define the constraint function. We have that the ratings average to 0
def con(x):
return np.mean(x[1])
# Constraint dictionary
cons = {'type': 'eq', 'fun': con}
# Minimize sum of squared errors
from scipy import optimize
# Initial guesses (numbers I randomly thought of in my head)
home_edge = 0.892
home_ratings = np.array([1.46, 9.67, -0.82])
away_ratings = np.array([-3.10, -6.57, 1.46])
x_init = [np.repeat(home_edge, 3), home_ratings, away_ratings]
# Minimize
results = optimize.minimize(fun = obj_fun, args = (df["Margin"]),
x0 = x_init, constraints = cons)
print(results.x)
[-2.9413615 0. 4.72534244 1.46 9.67 -0.82
-3.1 -6.57 1.46 ]
I don't know what's going wrong here, but I want my output to have 6 solutions, not 9. One for the home edge, and the remaining five for each team. What's going wrong? Thank you!