9

I have a data set with first column is the Date and Second column is the Price. The Date is trading days.

enter image description here

I want to return a table looks like this:

enter image description here

Where the date is each Month starting from 2006, price MA is the average price of past N months.(N = [1,2,3,4,5,6])

So for example: If I want N = 1 at Jan.1 2006 Ma should be the average price from December last year If N =2 Ma should be the average price from Nov and December last year

I have read some solution about Extract month from datetime and groupby. But don't know how to put them up together.

Dylan
  • 915
  • 3
  • 13
  • 20
  • Look [here](http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.rolling_mean.html) and use the rolling mean – 2Obe Aug 22 '17 at 20:38

3 Answers3

9

Or you simply try

df.sort_index(ascending=False).rolling(5).mean().sort_index(ascending=True)

For your additional question

index=pd.date_range(start="4th of July 2017",periods=30,freq="D")
df=pd.DataFrame(np.random.randint(0,100,30),index=index)
df['Month']=df.index
df.Month=df.Month.astype(str).str[0:7]
df.groupby('Month')[0].mean()


Out[162]: 
Month
2017-07    47.178571
2017-08    56.000000
Name: 0, dtype: float64

EDIT 3 : For missing value rolling two month mean

index=pd.date_range(start="4th of July 2017",periods=300,freq="D")
df=pd.DataFrame(np.random.randint(0,100,300),index=index)
df['Month']=df.index
df.Month=df.Month.astype(str).str[0:7]
df=df.groupby('Month')[0].agg({'sum':'sum','count':'count'})
df['sum'].rolling(2).sum()/df['count'].rolling(2).sum()


Out[200]: 
Month
2017-07          NaN
2017-08    43.932203
2017-09    45.295082
2017-10    46.967213
2017-11    46.327869
2017-12    49.081967
#etc
BENY
  • 317,841
  • 20
  • 164
  • 234
  • In combination with dropna to get rid of the NA values at the end like `df.sort_index().rolling(5).mean().dropna()` – 2Obe Aug 22 '17 at 21:02
  • @2Obe maybe he want `fillna` with some value – BENY Aug 22 '17 at 21:03
  • by using rolling(5) we are fixing the observation numbers to 5 right? But the trading days in the previous month is not a fixed number. – Dylan Aug 22 '17 at 21:55
  • @Dylan then you need `resample` https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.resample.html – BENY Aug 22 '17 at 22:07
  • @Wen Thanks, I found out that df.resample('M').mean() exactly solved my problem when N = 1. When I want to try case where N =2. I used df.resample('2M').mean() I am getting Jan, Mar, May ,July. The numbers are the right ones I want( mean of past 2 months.) However I don't have Prices for Feb,April and so on. How should I modify this? – Dylan Aug 22 '17 at 22:22
  • @Dylan and you can look at fillna there are method called ffill and bfill – BENY Aug 22 '17 at 22:47
  • @Wen This won't work since there are no missing values. I find this question is the same as mine: https://stackoverflow.com/questions/24875671/resample-in-a-rolling-window-using-pandas He clarified why resample won't work for N > 1 case – Dylan Aug 22 '17 at 22:57
  • @Wen Thanks for updating, however I think the updated answer only gives the mean of past 1 month right? If so, the effect would be the same as using resample('M'). What if I want the mean of past 3,4,5 months? – Dylan Aug 23 '17 at 00:10
  • @Dylan , as you said you have missing data, So , for M=2 Jan + Mar or Jan +Feb ? – BENY Aug 23 '17 at 00:54
3

Will return the rolling mean for the number of periods specified with the window parameter. E.g. window=1 will retunr the original list. Window=2 will calculate the mean for 2 days and so on.

index=pd.date_range(start="4th of July 2017",periods=30,freq="D")



df=pd.DataFrame(np.random.randint(0,100,30),index=index)


print([pd.rolling_mean(df,window=i,freq="D") for i in range(1,5)])

.....

2017-07-04   NaN
2017-07-05  20.5
2017-07-06  64.5
2017-07-07  58.5
2017-07-08  13.0
2017-07-09   4.5
2017-07-10  17.5
2017-07-11  23.5
2017-07-12  40.5
2017-07-13  60.0
2017-07-14  73.0
2017-07-15  90.0
2017-07-16  56.5
2017-07-17  55.0
2017-07-18  57.0
2017-07-19  45.0
2017-07-20  77.0
2017-07-21  46.5
2017-07-22   3.5
2017-07-23  48.5
2017-07-24  71.5
2017-07-25  52.0
2017-07-26  56.5
2017-07-27  47.5
2017-07-28  64.0
2017-07-29  82.0
2017-07-30  68.0
2017-07-31  72.5
2017-08-01  58.5
2017-08-02  67.0

.....

Further you can drop NA values with the df dropna method like:

df.rolling(window=2,freq="D").mean().dropna() #Here you must adjust the window  size

So the whole code which should print you the rolling mean for the months is:

print([df.rolling(i,freq="m").mean().dropna() for i in range(len(df.rolling(window=1,freq="m").sum()))])
2Obe
  • 3,570
  • 6
  • 30
  • 54
-1

First, set Date as index:

price_df.set_index('Date', inplace=True)
price_df.index = pd.to_datetime(price_df.index)

Then, calculate moving average from past N months:
mv = price_df.rolling(window=i*30, center=False).mean().dropna() for N=i

Finally, return a subset only with first day of each month (if that is what you want to return):
mv.ix[mv.index.day==1]

Yanfei W.
  • 1
  • 1