The College Football Database (cfbd) contains all team ranks for each week of every college football season going back to 1937.I am trying to set up data from the cfbd in a way that will allow me to examine how a particular team's AP Top 25 ranking changes week to week in multiple seasons. Ideally, I would like to organize this data like so:
Team | Season | Week1 Rank | Week2 Rank | 16 more weeks worth of ranks | ... | ||
---|---|---|---|---|---|---|---|
Alabama | 1937 | 4 | 3 | etc | etc | ||
Alabama | 1938 | 3 | 6 | etc | |||
etc | |||||||
Wyoming | 2017 | 24 | nr | ||||
Wyoming | 2020 | nr | 25 |
So the output table will have a row for every school that has ever been ranked for each year that it has been ranked.
After running pip install cfbd
I set up the API:
import pandas as pd
import cfbd
configuration = cfbd.Configuration()
# The cfbd API requires users to sign up for a free
# private key in order to access the data.
# https://collegefootballdata.com/key
configuration.api_key['Authorization'] = '[MY_SECRET_KEY]'
configuration.api_key_prefix['Authorization'] = 'Bearer'
api_rankings = cfbd.RankingsApi(cfbd.ApiClient(configuration))
#Initialize blank dataframe
allrankings_df = pd.DataFrame()
#year is required in the API call, so I have to collect each year separately
for yr in range(1936,2022):
rankings = api_rankings.get_rankings(year = yr) #get the data
rankings_df = pd.DataFrame.from_records([p.to_dict() for p in rankings])
allrankings_df2 = allrankings_df2.append(rankings_df, ignore_index=True)
This gives me a dataframe structured like so: |season|season_type|week|List of poll objects| |------|-----------|----|-------------------| |1936|regular|1|object|
Those poll objects have a Poll Name ("AP Top 25", "Coaches Poll") and ranking data. Like this, except with four or five different Polls.
{'poll': 'Coaches Poll',
'ranks': [{'rank': 1,
'school': 'Alabama',
'conference': 'SEC',
'firstPlaceVotes': 44,
'points': 1601},
{'rank': 2,
'school': 'Clemson',
'conference': 'ACC',
'firstPlaceVotes': 14,
'points': 1536},
The API describes all of this like so:
[
{
"season": 0,
"seasonType": "string",
"week": 0,
"polls": [
{
"poll": "string",
"ranks": [
{
"rank": 0,
"school": "string",
"conference": "string",
"firstPlaceVotes": 0,
"points": 0
}
]
}
]
}
]
Phew. I can sort of picture how to do this by iterating through every year & week & polling object and building a new table piece by piece. But I also have read many times that I shouldn't do that - that I should be vectorizing.
I know at this point I should share what I have tried so far, but to be honest, I am nowhere close. Can anyone point me in the right direction? I am willing to bang my head on this, but I can't even tell how I should be banging. Do I need some dataframe methods like melt
or ravel
? Or should I be trying to set this up with Boolean dataframe referencing?
references: https://api.collegefootballdata.com/api/docs/?url=/api-docs.json#/rankings/getRankings https://pypi.org/project/cfbd/