I am trying to create a new column of the moving average (window=3) called 'MA3_WHIP' of a column 'cum_year_WHIP' in a data frame. I tried the following code to achieve it:
read_and_optimized['MA3_WHIP'] = read_and_optimized['cum_year_WHIP'].rolling(3).mean()
But that for some reason does not give me the rolling average that I'd like.
Before creating the 'cum_year_WHIP' column, I sorted the df by 'YEAR_ID' and 'Game_date':
read_and_optimized.sort_values(['YEAR_ID','Game_Date'], ascending=True,inplace=True)
Then I created the 'cum_year_WHIP' column which is the column that the new rolling average column 'MA3_WHIP' is based on and was calculated by using cumsum() on math between three other columns ((cum_walks_a + cum_hits_a)/cum_innings_pitched):
read_and_optimized['cum_year_WHIP'] =(read_and_optimized['cum_year_walks_a'] + read_and_optimized['cum_year_hits_a'])/ read_and_optimized['cum_year_innings_pitched']
In particular, I'd like the 'MA3_WHIP' to be sorted by the 'YEAR_ID' and 'Game_Date' columns just as the 'cum_year_WHIP' column is, as well as grouped by the 'resp_starting_pitcher' and 'YEAR_ID' columns.
To print out what the table looks like, I use this code:
df=read_and_optimized[['YEAR_ID','Game_Date','resp_starting_pitcher','cum_year_WHIP','MA3_WHIP']].sort_values(['YEAR_ID','Game_Date'], ascending=True).groupby(['resp_starting_pitcher','YEAR_ID']).apply(print)
and from that it gives me this undesired output:
YEAR_ID Game_Date resp_starting_pitcher cum_year_WHIP MA3_WHIP
30677 2012 2012-08-25 abadf001 2.000000 1.438035
19247 2012 2012-08-31 abadf001 2.280009 1.547771
35725 2012 2012-09-05 abadf001 2.270277 1.622140
19257 2012 2012-09-12 abadf001 2.234052 1.736054
42448 2012 2012-09-18 abadf001 1.983877 1.646596
19273 2012 2012-09-24 abadf001 1.880600 1.444433
YEAR_ID Game_Date resp_starting_pitcher cum_year_WHIP MA3_WHIP
6930 2011 2011-05-21 aceva001 1.000000 1.257886
17000 2011 2011-05-26 aceva001 1.090909 1.228938
6936 2011 2011-05-31 aceva001 1.437500 1.554379
6954 2011 2011-06-21 aceva001 1.571429 1.710058
What I'd like to get instead is a rolling average of 'cum_year_WHIP' that starts over with each new 'resp_starting pitcher' and at the start of each new 'YEAR_ID'. It should look like this:
YEAR_ID Game_Date resp_starting_pitcher cum_year_WHIP MA3_WHIP
30677 2012 2012-08-25 abadf001 2.000000 Nan
19247 2012 2012-08-31 abadf001 2.280009 Nan
35725 2012 2012-09-05 abadf001 2.270277 2.183428
19257 2012 2012-09-12 abadf001 2.234052 2.261446
42448 2012 2012-09-18 abadf001 1.983877 2.162735
19273 2012 2012-09-24 abadf001 1.880600 2.032843
YEAR_ID Game_Date resp_starting_pitcher cum_year_WHIP MA3_WHIP
6930 2011 2011-05-21 aceva001 1.000000 Nan
17000 2011 2011-05-26 aceva001 1.090909 Nan
6936 2011 2011-05-31 aceva001 1.437500 1.171613
6954 2011 2011-06-21 aceva001 1.571429 1.366612
YEAR_ID Game_Date resp_starting_pitcher cum_year_WHIP MA3_WHIP
7210 2013 2013-04-11 aceva001 1.800000 Nan
13938 2013 2013-04-17 aceva001 1.900000 Nan
7226 2013 2013-04-23 aceva001 2.250006 1.983333
7260 2013 2013-05-27 aceva001 2.068969 2.072991
44210 2013 2013-06-12 aceva001 1.894739 2.071238
7276 2013 2013-06-18 aceva001 1.780222 1.914643
When I use the following it works to produce a view of the way the table should look:
read_and_optimized.groupby(['resp_starting_pitcher','YEAR_ID'])['cum_year_WHIP'].rolling(3).mean()
However, when I try to make a new column from the above code as suggested in other posts of similar problems, it gives me an error:
read_and_optimized['MA3_WHIP']= read_and_optimized.groupby(['resp_starting_pitcher','YEAR_ID'])['cum_year_WHIP'].rolling(window=3).mean()
The error is:
TypeError: incompatible index of inserted column with frame index
Is there a way I can accomplish creating this new column in the data frame?
I have viewed answers to similar dilemmas at: Why is groupby and rolling not working together?
and at: Pandas - moving averages - use values of previous X entries for current row
but am not able to get it done.
Help getting this done would be much appreciated.