How to do rolling sum with conditional window criteria on different index levels in Python

Question

I want to do a rolling sum based on different levels of the index but am struggling to make it a reality. Instead of explaining the problem am giving below the demo input and desired output along with the kind of insights am looking for.

So I have multiple brands and each of their sales of various item categories in different year month day grouped by as below. What I want is a dynamic rolling sum at each day level, rolled over a window on Year as asked.

for eg, if someone asks

Demo question 1) Till a certain day(not including that day) what were their last 2 years' sales of that particular category for that particular brand.

I need to be able to answer this for every single day i.e every single row should have a number as shown in Table 2.0.

I want to be able to code in such a way that if the question changes from 2 years to 3 years I just need to change a number. I also need to do the same thing at the month's level.

demo question 2) Till a certain day(not including that day) what was their last 3 months' sale of that particular category for that particular year for that particular brand.

Below is demo input

The tables are grouped by brand,category,year,month,day and sum of sales from a master table which had all the info and sales at hour level each day

Table 1.0

Brand	Category	Year	Month	Day	Sales
ABC	Big Appliances	2021	9	3	0
	Clothing	2021	9	2	0
	Electronics	2020	10	18	2
	Utensils	2020	10	18	0
		2021	9	2	4
				3	0
XYZ	Big Appliances	2012	4	29	7
		2013	4	7	6
	Clothing	2012	4	29	3
	Electronics	2013	4	9	1
				27	2
			5	4	5
		2015	4	27	7
			5	2	2
	Fans	2013	4	14	4
			5	4	0
		2015	4	18	1
			5	17	11
		2016	4	12	18
	Furniture	2012	5	4	1
				8	6
				20	4
		2013	4	5	1
				7	8
				9	2
		2015	4	18	12
				27	15
			5	2	4
				17	3
	Musical-inst	2012	5	18	10
		2013	4	5	6
		2015	4	16	10
				18	0
		2016	4	12	1
				16	13
	Utencils	2012	5	8	2
		2016	4	16	3
				18	2
		2017	4	12	13

Below is desired output for demo question 1 based on the demo table(last 2 years cumsum not including that day)

Table 2.0

Brand	Category	Year	Month	Day	Sales	Conditional Cumsum(till last 2 years)
ABC	Big Appliances	2021	9	3	0	0
	Clothing	2021	9	2	0	0
	Electronics	2020	10	18	2	0
	Utensils	2020	10	18	0	0
		2021	9	2	4	0
				3	0	4
XYZ	Big Appliances	2012	4	29	7	0
		2013	4	7	6	7
	Clothing	2012	4	29	3	0
	Electronics	2013	4	9	1	0
				27	2	1
			5	4	5	3
		2015	4	27	7	8
			5	2	2	15
	Fans	2013	4	14	4	0
			5	4	0	4
		2015	4	18	1	4
			5	17	11	5
		2016	4	12	18	12
	Furniture	2012	5	4	1	0
				8	6	1
				20	4	7
		2013	4	5	1	11
				7	8	12
				9	2	20
		2015	4	18	12	11
				27	15	23
			5	2	4	38
				17	3	42
	Musical-inst	2012	5	18	10	0
		2013	4	5	6	10
		2015	4	16	10	6
				18	0	16
		2016	4	12	1	10
				16	13	11
	Utencils	2012	5	8	2	0
		2016	4	16	3	0
				18	2	3
		2017	4	12	13	5

End thoughts:

The idea is to basically do a rolling window over year column maintaining the 2 years span criteria and keep on summing the sales figures.

P.S I really need a fast solution due to the huge data size and therefore created a .apply function row-wise which I didn't find feasible. A better solution by using some kind of group rolling sum or supporting columns will be really helpful.

input type can be any shoudnt effect the solution. take csv for now — imhaka, Apr 29 '22 at 09:52

score 1 · Accepted Answer · answered Apr 29 '22 at 10:23

1

Here I'm giving a sample solution for the above problem.
I have concidered just onr product so that the solution would be simple

Code:

from datetime import date,timedelta
Input={"Utencils": [[2012,5,8,2],[2016,4,16,3],[2017,4,12,13]]}
Input1=Input["Utencils"]
Limit=timedelta(365*2)
cumsum=0
lis=[]
Tot=[]
for i in range(len(Input1)):
    if(lis):
        while(lis):
            idx=lis[0]
            Y,M,D=Input1[i][:3]
            reqDate=date(Y,M,D)-Limit
            Y,M,D=Input1[idx][:3]
            if(date(Y,M,D)<=reqDate):
                lis.pop(0)
                cumsum-=Input1[idx][3]
            else:
                break
    Tot.append(cumsum)
    lis.append(i)
    cumsum+=Input1[i][3]
print(Tot)

Here Tot would output the required cumsum column for the given data. Output:

[0, 0, 3]

Here you can specify the Time span using Number of days in Limit variable. Hope this solves the problem you are looking for.

answered Apr 29 '22 at 10:23

Kalyan Reddy

326
1
8

If you are satisfied with the above method, I could give a much smaller code which would solve the above in fewer lines. – Kalyan Reddy Apr 29 '22 at 10:26
Hey just allow me some time to go through this and get back – imhaka Apr 29 '22 at 10:35
Hey this looks good, to parse a dataframe having an additional index like in my example do i need to put that in a for loop first? – imhaka May 02 '22 at 19:18
@DevrupBanerjee To use pd.read_csv(), you better have brand name and product name on every row. Then you could easily use df= pd.read_csv() and can get each product dataframe using, df[df['Brand"]=="ABC"] and df[df["Category"]=="Utencils"] to get only perticular rows which your require. – Kalyan Reddy May 03 '22 at 04:19
@DevrupBanerjee To add the cumsum column to you original dataframe use: df["Cumsum"] = list(that we calculated in above process) – Kalyan Reddy May 03 '22 at 04:23
@DevrupBanerjee If my answer solves your main problem do accept my answer so that I could get the bounty lol :))) – Kalyan Reddy May 03 '22 at 04:25
Hey just a small last thing, if instead of cumsum, like shown in the above code if I were to take the cumulative average, while every other principle of calculation remains the same, what must I change in the code? – imhaka May 04 '22 at 13:25
@DevrupBanerjee Tot.append(cumsum/len(lis)) should work I guess in last 4th line – Kalyan Reddy May 04 '22 at 17:10

How to do rolling sum with conditional window criteria on different index levels in Python

1 Answers1