0

I have a dataframe of daily sales:

import pandas as pd

date = ['28-01-2017','29-01-2017','30-01-2017','31-01-2017','01-02-2017','02-02-2017']
sales = [1,2,3,4,1,2]
ym = [201701,201701,201701,201701,201702,201702]
prev_1_ym = [201612,201612,201612,201612,201701,201701]
prev_2_ym = [201611,201611,201611,201611,201612,201612]

df_test = pd.DataFrame({'date': date,'ym':ym,'prev_1_ym':prev_1_ym,'prev_2_ym':prev_2_ym,'sales':sales})

df_test['date'] = pd.to_datetime(df_test['date'],format = '%d-%m-%Y')

I am trying to find total sales in the previous 1m, previous 2m etc..

My current approach is to use a list comprehension:

df_test[prev_1m_sales] = [ sum(df_test.loc[df_test['ym'] == x].sales) for x in df_test[prev_1_ym] ]

However, this proves to be very slow.

Is there a way to speed it up by using .groupby()?

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
morienor
  • 339
  • 1
  • 2
  • 8

1 Answers1

0

you can use the date column to group your data, first change its data-type to pandas TimeStamps,

df['dates']=pd.to_datetime(df['dates'])

then you can use it directly in grouping for example

df.groupby(df.data.month).sales.sum().cumsum()
Mohammed Khalid
  • 155
  • 1
  • 6