3

I have a dataframe with multiple index and would like to create a rolling sum of some data, but for each id in the index.

For instance, let us say I have two indexes (Firm and Year) and I have some data with name zdata. The working example is the following:

import pandas as pd

# generating data
firms = ['firm1']*5+['firm2']*5
years = [2000+i for i in range(5)]*2
zdata = [1 for i in range(10)]

# Creating the dataframe
mydf  = pd.DataFrame({'firms':firms,'year':years,'zdata':zdata})

# Setting the two indexes
mydf.set_index(['firms','year'],inplace=True)

print(mydf)
             zdata
firms year       
firm1 2000      1
      2001      1
      2002      1
      2003      1
      2004      1
firm2 2000      1
      2001      1
      2002      1
      2003      1
      2004      1

And now, I would like to have a rolling sum that starts over for each firm. However, if I type

new_rolling_df=mydf.rolling(window=2).sum()

print(new_rolling_df)
              zdata
  firms year       
  firm1 2000    NaN
        2001    2.0
        2002    2.0
        2003    2.0
        2004    2.0
  firm2 2000    2.0
        2001    2.0
        2002    2.0
        2003    2.0
        2004    2.0

It doesn't take into account the multiple index and just make a normal rolling sum. Anyone has an idea how I should do (especially since I have even more indexes than 2 (firm, worker, country, year)

Thanks,

Adrien

Adrien
  • 461
  • 5
  • 19

1 Answers1

4

Option 1

mydf.unstack(0).rolling(2).sum().stack().swaplevel(0, 1).sort_index()

enter image description here

Option 2

mydf.groupby(level=0, group_keys=False).rolling(2).sum()

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thank you so much ! I will use the second solution since it takes too much memory to unstack the large dataframe I have. Quick comment, can you explain the "level=0". What does the number 0 refer to ? Especially since I want to replicate this with even more indexes. – Adrien Aug 13 '16 at 16:15
  • 1
    Level zero is the first level of the index. If you group by an index, you must specify the level even if there is only one level. And, it's the only way to tell groupby that I want to group by an index level. – piRSquared Aug 13 '16 at 17:06
  • @piRSquared- Thank you for this info, have had much trouble getting this to work. For me (I think) it was the group_keys that fixed it. Can you explain what this does? Without this I had a column of NaNs. Thank you again! – Andrew L Feb 21 '17 at 01:33