0

I am trying to display the cumulative average revenue for each year my customers did a transaction, based on the Year the customers were onboarded (first transaction).

I think I need to use groupby but I figure out the best way to achieve the desired result. I have an example plot that I wish to replicate or (get something similar):

enter image description here

Each line on the plot represents the 'Year Onboarded'.

I have created a toy data to make it easier:

dataset = {'ClientId': [10,20,20,20,10,5,3,7,5,20,12,5],
           'Year Onboarded': [2019,2017,2018,2017,2020,2019,2017,2018,2019,2020,2018,2017],
           'Year': [2020,2018,2018,2017,2020,2020,2018,2019,2020,2020,2018,2019],
           'Revenue': [100,50,25,30,40,50,200,600,20,40,100,20]}
df = pd.DataFrame(data=dataset)
Killi Mandjaro
  • 145
  • 2
  • 15

1 Answers1

1

If I am correct (since the toy data still shows some inconsistency, i.e. shouldn't the "Year Onboarding" be the same for a given customer?), you are looking for something on the lines of:

df['cumsum'] = df.groupby(['ClientId', 'Year']).Revenue.transform('cumsum')

And this would yield:

result

Jzbach
  • 346
  • 3
  • 17