I have a dataframe of daily sales:
import pandas as pd
date = ['28-01-2017','29-01-2017','30-01-2017','31-01-2017','01-02-2017','02-02-2017']
sales = [1,2,3,4,1,2]
ym = [201701,201701,201701,201701,201702,201702]
prev_1_ym = [201612,201612,201612,201612,201701,201701]
prev_2_ym = [201611,201611,201611,201611,201612,201612]
df_test = pd.DataFrame({'date': date,'ym':ym,'prev_1_ym':prev_1_ym,'prev_2_ym':prev_2_ym,'sales':sales})
df_test['date'] = pd.to_datetime(df_test['date'],format = '%d-%m-%Y')
I am trying to find total sales in the previous 1m, previous 2m etc..
My current approach is to use a list comprehension:
df_test[prev_1m_sales] = [ sum(df_test.loc[df_test['ym'] == x].sales) for x in df_test[prev_1_ym] ]
However, this proves to be very slow.
Is there a way to speed it up by using .groupby()?