I am trying to create a column of the 10-day moving average of points for nba players. My dataframe has game by game statistics for each player, and I would like to have the moving average column contain the 10 day moving average at that point. I have tried df.groupby('player')['points].rolling(10,1).mean, but this is just giving me the number of points scored on that day as the moving average. All of the players from each day are listed and then the dataframe moves onto the following day, so I could have a couple hundred rows with the same date but different players' stats. Any help would be greatly appreciated. Thanks.
Asked
Active
Viewed 602 times
1
-
Your question appears to be missing information necessary to assist you. Please include a [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example) in which all of the necessary information is present so that it can be easily and efficiently tested. Please include the relevant portions of your solution and a sample of your DataFrame. `df.to_dict().head(15)` or so would probably be a reasonable amount to test your rolling average. Also please mockup a sample of what your expected output is **for the sample data you provide** not your full dataset. – Henry Ecker Apr 22 '21 at 21:41
1 Answers
3
As stated, you really should provide a sample dataset, and show what you are trying to achieve. However, I love working with sports data so don't mind puting in the minute or so to get a sample set.
So basically you need to do a rolling mean on a groupby. You'll notice obviously the first 10 rows of each player are blank, because it doesn't have 10 dates to take the mean of. You can change that by changing the min to 1. Also, when you do this, you want to make sure your data is sorted by date (which here it already is).
import pandas as pd
player_link_list = ['https://www.basketball-reference.com/players/l/lavinza01/gamelog/2021/',
'https://www.basketball-reference.com/players/v/vucevni01/gamelog/2021/',
'https://www.basketball-reference.com/players/j/jamesle01/gamelog/2021/',
'https://www.basketball-reference.com/players/d/davisan02/gamelog/2021/']
dfs = []
for link in player_link_list:
w=1
df = pd.read_html(link)[-1]
df = df[df['Rk'].ne('Rk')]
df = df[df['PTS'].ne('Inactive')]
df['Player'] = link.split('/')[-4]
df['PTS'] = df['PTS'].astype(int,errors = 'ignore')
dfs.append(df)
df = pd.concat(dfs)
df['rolling_10_avg'] = df.groupby('Player')['PTS'].transform(lambda s: s.rolling(10, min_periods=10).mean())

chitown88
- 27,527
- 4
- 30
- 59
-
Thanks for the response! So I figured out that the way my DataFrame was indexed (indexing returned to 1 for each new day of games) in a way that did not allow for proper calculation, but when I reset the index prior to running this it worked! Thanks for the help - if I have other questions related to this project I know who to come to! – Steven Miller Apr 26 '21 at 12:53